Cut -Wobot b - 2b - PED. 


THE BRITISH 
JOURNAL OF 
EDUCATIONAL 
PSYCHOLOGY 
| 26] 


NOVEMBER 1987 


VOL. 57, PART.3. 


THE 
BRITISH JOURNAL 
OF 
EDUCATIONAL PSYCHOLOGY 


Edited by 
Professor HAZEL FRANCIS 


Assistant Editors: Sub-Editor: 
Dr. H. C. M. CARROLL Dr. S. CUNNINGHAM 
Prof. C. W. DESFORGES 


Book Review Editor: 


Dr. D. FONTANA Prof. C. W. DESFORGES 
Dr. F. J. TAYLOR 


Advised by the Editorial Committee: 
Professor G. BROWN Ms. I. LUNT 
Dr. C. LEACH Professor P. W. ROBINSON 
Mr. E. WILKINSON 


4 “LCG 2 
Vol. JE 4 T о 


The British Journal of Educational Psychology is issued by the British Psychological Society 
once a term, in February, June and November. The subscription 1s £18.00 a year post free, payable 
1n advance, or £6.50 each number 


Subscriptions and Orders should be sent to Scottish Academic Press, 33 Montgomery Street, 
Edinburgh, EH7 5JX. Members of the British Psychological Society receive the Journal on special 
terms. 


Papers for publication in duplicate should be sent to Professor Hazel Francis, University of 
London Institute of Education, 24-27 Woburn Square, London, WCIH ОАА. Guidance to 
contributors on the preparation of papers for the Journal is given at the end of the November 
number each year 


All Books for Review for The British Journal of Educational Psychology should be sent to: 
Prof. C. W Desforges, University of Exeter, St Luke’s, Heavitree Road, Exeter, EX1 2LU. 


Advertisements for this Journal should be sent to Scottish Academic Press 


Published by 


SCOTTISH ACADEMIC PRESS, 33 MONTGOMERY STREET, EDINBURGH, EH7 5JX, 
for the British Journal of Educational Psychology, Ltd. 


All rights reserved 


© 1987. The Bntish Journal of Educational Psychology 


PRINTED IN GREAT BRITAIN BY LINDSAY & CO LTD., EDINBURGH 


p A 
“ Ка 2; = Br. J. educ. Psychol., 57, 265-278, 1987 


A COMPARISON OF TWO PROCEDURES FOR TEACHING 
DISCRIMINATION SKILLS TO DOWN’S SYNDROME AND 


NON-HANDICAPPED CHILDREN 270, /5? 5 


By LOUISE DUFFY Амр JENNIFER С. WISHART 
(Department of Psychology, University of Edinburgh) 


Summary. Two different strategies for teaching discrimination to Down’s Syndrome (DS) 
and non-handicapped children were compared for relative efficiency: trial-and-error and 
errorless learning. Two types of discrimination tasks were used, shape and nonsense 
figure tasks. A pre-test was used to match children for pre-existing ability. Errorless 
learning proved to be the superior training strategy in each group, both during training 
and in post-tests. DS children responded poorly to trial-and-error training in both 
absolute and relative terms. Although order of presentation of training conditions had 
little effect on performance in the non-handicapped group, an interesting differential 
effect emerged in the DS group: initial trial-and-error training adversely affected 
subsequent performance in the errorless task while initial errorless experience enhanced 
subsequent trial-and-error performance. It would appear from these results that errorless 
learning may be useful as a ‘‘primer’’, increasing motivation to learn in more conven- 
tional learning situations. 


INTRODUCTION 

Dowws Syndrome (DS) is the single greatest cause of mental retardation in the UK, 
accounting for nearly one third of children with severe mental handicap. Huge 
advances in our understanding of the metabolic and biochemical abnormalities 
associated with DS have been made recently (see Smith, 1985) but these are unlikely 
to benefit those children already born with DS. Prenatal screening has made some 
inroads into incidence rates but medical and economic factors at present limit the 
use of screening to women already known to be at risk, that is, women over 35 or 
women who have already given birth to a child with DS; even within this high risk 
group, it would appear that a significant proportion are either not offered screening 
or do not take up that offer (Walker and Howard, 1986). This at-risk group 
accounts in any case for only one third of all DS births. Prevention of DS by use of 
pre- -natal screening techniques is still, therefore, a distant prospect. Recent studies, 
in fact, show little decline in overall incidence rates, with figures suggesting an 
increase in infants born to mothers in younger age groups (Abroms and Bennett, 
1983; Stratford and Steele, 1985). Prevalence, additionally, has increased four- fold 
within the last generation as a result of advances in medical technology, particularly 
in paediatric cardiology (Kirman, 1983). 


As long as prevalence and incidence figures remain high, it would be unwise to 
abandon psychological approaches to the study of DS on the grounds that future 
discoveries in the medical field may ameliorate — or even eliminate — the condition. 
The anomalies associated with DS are expressed behaviourally. Psychology can, 
therefore, contribute much to attempts at facilitative intervention. It seems essential 
that we continue to try to learn more about the exact nature of the mental handicap 
found in children with DS. It seems particularly important that we attempt to find 
some way to counteract the progressive decline in rate of mental development 
generally found with increasing age (Gibson, 1978). 


Recent years have seen a significant impact on development as a result of DS 
children being brought up in the parental home rather than in an institutional setting 
(Centerwall and Centerwall, 1960; Carr, 1975, 1985). In normal children, variatiuns 
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in ability, particularly in IQ, are often associated with factors such as parental IQ 
and social class. This relationship does not generally hold true in DS (Gibson, 1978; 
Carr, 1985), although, interestingly, exceptions to this trend have recently been 
reported (Sharev et al., 1985; Cunningham, 1986). Given the wide variation in 
learning ability found in children with DS and this absence of any clear, straight- 
forward relationship with envirommental factors usually closely associated with 
developmental outcome, it would seem that other, less obvious factors must be 
influencing development in DS. If these could be identified, appropriate fine- 
tailoring of the environment to the particular needs and skills of the DS child might 
well improve on presently achieved levels of development. It is important, however, 
to be realistic about what may be achieved: the genetic component in DS must 
inevitably set some upper limit on improvement in developmental outcome. 
Nonetheless, there seems reason to believe that with appropriately sensitive teaching 
methods, more children with DS could be encouraged to develop to their full 
potential. 


Two distinct psychological approaches to research into learning difficulties in 
mentally handicapped children dominate the literature, one based in cognitive 
theories of mental handicap, the other emphasising motivational deficits. The 
cognitive approach emphasises deficits in cognitive functioning as the source of 
learning difficulties: development in the mentally handicapped is generally 
characterised as identical in nature to cognitive development in normal children, 
with only rate and end-point of development differentiating the two populations 
(Illingworth, 1980). In contrast, motivational theorists emphasise the role of 
motivational deficiencies, often classifying the mentally handicapped as ‘‘failure 
avoiders" (Cromwell, 1967): more motivated to avoid failure than to achieve 
success, the mentally handicapped are characterised as avoiding learning situations 
and responding poorly to traditional trial-and-error teaching methods (for reviews, 
see Zigler and Balla, 1982). The motivational and cognitive approaches outlined 
rarely overlap in any investigation of learning processes in mentally handicapped 
children. It is possible, however, that the cognitive deficits seen in the DS child may 
to some extent resu/t from motivational problems: frequent early experience of 
failure may erode motivation to learn, thereby contributing directly to subsequent 
deficits in functioning. 


It is, of course, impossible to examine motivation or cognition in isolation from 
each other. Terms such as ‘‘cognitive style’ acknowledge this. However, different 
teaching strategies place differing emphases on these two performance factors. The 
study reported here sought to compare two different methods for teaching 
discrimination to young children with DS: trial-and-error learning and errorless 
learning. 


Errorless learning is the most widely-used strategy for teaching new skills to the 
mentally handicapped. It has proved highly effective in teaching adolescents 
practical discrimination skills, skills previously unlearned when conventional 
methods were used (Cullen, 1976; McIvor and McGinley, 1983; Adams, 1984). To 
some degree, its adoption implies recognition of the possibility that motivational 
factors may influence cognitive outcome, but generally its use stems from the more 
pessimistic belief that the experience of erring in a learning situation will always be 
counter-productive in the mentally handicapped, adding to already existing 
cognitive problems. 


Despite the widespread use of errorless teaching strategies with all ages of 
handicapped children, few published studies have directly compared its efficiency 
with more traditional teaching methods. The most interesting papers generally 
report single, older case studies and often only anecdotal evidence of failure of other 
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teaching methods is given. Many learning theorists — in particular, Piaget (1936, 
1937, 1950) — would argue that erring is an essential element in the learning process 
and that error has as much to contribute to processes of cognitive development as 
success. Evidence of poor learning ability using traditional trial-and-error methods 
is not in itself evidence of inability to profit cognitively from mistakes: there has 
been little direct evidence to support the view thet the mentally handicapped cannot 
also learn from their own mistakes. It is possible, then, that adoption of an 
exclusively errorless approach throughout development may in some way be 
perverting or undermining the normal course of early learning. 


In the study presented here, the relative efficiency of errorless and trial-and- 
error methods in training discrimination skills in both DS and non-handicapped 
children was investigated. Two sets of tasks were used: shape discrimination and 
nonsense figure discrimination tasks. The use of nonsense figure tasks served as a 
control for any positive or negative effects that differential prior learning experience 
with the target concept might have on test performance; balancing of order of 
presentation of the two training conditions allowed examination of any within- 
session effects of exposure to the two kinds of learning situation. 


Three main hypotheses were tested: 


(1) That errorless learning would have a greater enhancing effect on 
performance than trial-and-error learning and that that effect would be 
greater in DS than in non-handicapped children. 

(2) That initial experience of errorless learning would enhance performance on 
a subsequently presented trial-and-error task and that this enhancement 
would be greater in DS children. 

(3) That initial experience of trial-and-error learning would adversely affect 
performance on a subsequently presented errorless task and that this effect 
would be present in DS children only. 


METHOD 


Sample matching 

Direct mental age (MA) matching of handicapped and non-handicapped 
children was avoided for a number of reasons, both practical and theoretical. The 
validity of MA matching in handicap studies has frequently been questioned (Clarke 
and Clarke, 1975; Woodward, 1979; Wishart, 1986a). The MA composite is arrived 
at by simple addition of scores on a number of test items. This means that two 
children with widely differing ability profiles can achieve identical MAs. Even in the 
normal population, some ‘‘matches’’ must for some purposes be inaccurate. Even 
more vulnerable to criticism are MA matches of mentally-handicapped children to 
much younger non-handicapped children; the greater the chronological age gap, the 
more widely-differing the individual learning histories of the MA-matched children 
and the more inexact — and less meaningful — any such ‘‘match’’ is likely to be 
(Hogg and Moss, 1983). * 


More importantly, perhaps, acceptance of psychometrically-based matching 
procedures implies acceptance of one particular theory of mental development in the 
handicapped, the ‘‘slow development” theory. This theory maintains that processes 
of cognitive development in the handicapped and non-handicapped are identical in 
nature, with only rate and end-point of development differing (see above). Recent 
research suggests, however, that learning in DS children may be radically different 
from that seen in normal children, with important qualitative as well as quantitative 
differences in learning processes existing in the two populations (Morss, 1983, 1985; 
Wishart, 1986b). Use of MA-matching involves accepting, moreover, that 
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motivational factors do not differentially influence performance in the two groups, 
that performance and competence are similarly linked in both populations. Again, 
there is evidence to suggest that this may not be the case (Byck, 1969; Balla and 
Zigler, 1979; Shaw, 1986; Wishart, 1987). 


Given the particular nature and aims of the study to be reported here, it seemed 
important to include neither of thèse theoretical assumptions in the experimental 
design. Children were therefore matched, not on MA, but on the basis of ability to 
pass a pre-test based on the discrimination tasks to be used in the experiment itself 
(see below). 


Down's Syndrome group. Thirteen children were selected from two Edinburgh 
special schools. Three were excluded on the basis of the pre-test (see below) and two 
had to be dropped because of inability to concentrate for long enough to take part 
effectively in the experiment. This left eight DS children, five males and three 
females, who completed the experiment (mean age: 7 years 9 months; SD: 14.6 
months). 


Non-handicapped group. Thirteen children were selected from two Edinburgh 
nursery schools. Four were already able to pass the pre-test and one further child 
had to be excluded later in the experiment because of attention difficulties. Mean 
age of the remaining eight non-handicapped children was 2 years 6 months (SD: 4 
months). There were four males and four females. 


Procedure 

Testing of all children took place in a small room, as free from distractions as 
could be arranged within the school settings. Child and experimenter sat opposite 
each other at a small table. 


Selection pre-test. One of the two target stimuli (a rectangle or oval) was 
presented together with two alternative stimuli: one ‘‘orthodox’’ shape (circle, 
square, triangle or parallelogram) and one ‘‘unorthodox”’ shape, introduced to limit 
the possibility that the oval or rectangle would be ‘‘correctly’’ identified by a process 
of elimination of more familiar, known shapes (see Figure 1). 


FIGURE 1 
Test CARDS USED IN SUBJECT SELECTION PRE-TEST 


All shapes were of different colours and approximately 7 cms on their longest 
dimension. Each was centred on a white card, 12 cm x 8 cm. Position of shape to be 
identified was randomised over trials. Children were told: ‘‘I am going to show you 
three cards with shapes on them. When I say the name of a shape, I want you to 
point to the card with that shape on it." 
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A minimum of 28 trials were given, a minimum of six of each target stimulus 
and four of each orthodox shape. Further trials were given as necessary until it could 
be established whether correct responses were due to random guessing or true ability 
to discriminate the shape in question. Responses were not differentially reinforced in 
this pre-test. Children were selected on the basis that they showed an ability to 
discriminate some shapes but were not yet able to discriminate rectangles and ovals. 


All children chosen on the basis of the selection pre-test went on to be presented 
with the pre-tests, training and post-tests of the shape discrimination tasks, followed 
by the training and post-tests of the nonsense figure discrimination tasks. (No pre- 
test was required for either of the nonsense figure tasks since no prior knowledge 
could safely be assumed). Both groups of children were divided into two subgroups 
and order of presentation of errorless and trial-and-error training within each 
discrimination task balanced over subgroups and for each child. 


(A) Shape discrimination tasks 
Trial-and-error training used an oval as the target stimulus, errorless training a 
rectangle. 


Pre- and post-tests. Seven sets of cards, identical in format to those used in the 
selection pre-test, were used. (Initial use of 10 trials in both the pre- and post-test 
resulted in an unacceptably long procedure.) The target stimulus varied in size, 
colour and proportion over trials in order to represent the concept of rectangle or 
oval in its most general form. Position of target stimulus was randomised over trials. 
Instructions to children were as in the selection pre-test. 


(i) Trial-and-error training (oval). As in the pre-test, an oval was presented 
together with a common and an unorthodox shape. 15 trial sets of three cards were 
used. Colour but not size was varied over trials. Order of presentation of the 15 
cards was randomised. 


Children were told: ‘‘I am going to show you three cards with shapes on them 
just as I did the last time and I want you to point to the card with the shape I ask for. 
This time I will tell you if you are right.” Correct responses were verbally praised. 
Incorrect responses were negatively reinforced; children were told: '*No, that is not 
right. Try again the next time.” 


(ii) Errorless training (rectangle) 15 trial sets of three cards were used. The trials 
were presented in an order by which two alternative stimuli to the rectangle were 
gradually ‘‘faded in’’, increasing in size while varying in shape over trials (see 
Figure 2). 


FIGURE 2 


EXAMPLES OF ERRORLESS LEARNING TEST CARDS 
(SHAPE DISCRIMINATION — RECTANGLE) 
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In the first two sets, the target stimulus was presented with two blank cards. In 
these and all subsequent trials, position of the target stimulus was randomised over 
trials, as was colour, the latter a precaution against misidentification of this as a 
relevant attribute of the target stimulus. Trials 3 and 4 consisted of a rectangle with 
two similarly coloured but much smaller (0.5 cm) alternative shapes. The dimensions 
of the alternative shapes were gradually increased over trials 5-13. The final two 
trials consisted of three red shapes, a rectangle and two alternatives of equivalent 
dimensions. 


Verbal instructions to the children were identical to those given in the trial-and- 
error training. Verbal praise was given each time the child made a correct response. 
Errors were not commented on but, rather than proceeding with the next trial, the 
previous trial was re-presented, this procedure being repeated as necessary, until the 
child had shown mastery of that particular step in the training sequence. 


(B) Nonsense figure discrimination tasks 
Trial-and-error training used ‘‘nims’’ as the target stimulus, errorless training 
used ‘‘wugs’’ (see below). 


Pre-tests. No pre-tests were presented since no prior knowledge of the concept 
of ‘‘nims’’ or **wugs"' could be assumed. 


(i) Trial-and-error training (‘‘nims’’): 15 trial sets of three cards were presented, 
each showing a ‘‘nim’’ and two other nonsense figures of equivalent size and colour 
(see Figure 3). 


FIGURE 3 


EXAMPLES OF TRIAL-AND-ERROR LEARNING TEST CARDS 
(NoNSENSE FIGURE DISCRIMINATION — NIMS) 


Children were shown a drawing of a Mr Plimp — a ''nim""-like figure — and 
told that he had some friends called ‘‘nims’’ in the pack of cards on the table. The 
aim of the game was to help Mr Plimp to find all the ‘‘nims”’. They were then told: 
“I am going to show you three cards with little men on them. I want you to point to 
the card you think has Mr Nim on it. Correct responses were verbally praised on 
behalf of Mr Plimp and when an incorrect response was made, children were told to 
try again next time. 








(ii) Errorless training (‘‘wugs’’): 15 trial sets of three cards were used. 
Alternative stimuli to the ‘‘wug’’ were faded in over trials in exactly the same way as 
in the errorless rectangle training. . 


Post-tests. Post-test trials consisted of seven test card sets, each with either a 
“nim” or a ‘‘wug’’ and two other similarly sized nonsense figures. 
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RESULTS 


All responses in the pre-tests, training and post-tests were used in the analysis. 
Scores in the pre- and post-tests were expressed in terms of the number of correct 
responses made. Since the number of trials presented in the errorless procedure was 
child-determined, scores in the errorless and trial-and-error training periods were 
expressed as percentages (correct responses/total responses). 


Shape discrimination 

Pre-test trials. Pre-test scores on both shape tasks were compared in the DS and 
non-handicapped groups. No significant differences were found, thereby validating 
the procedure adopted for sample selection and matching (oval (trial-and-error): 
t = 1.078, df (14), NS; rectangle (errorless): t = 1.56, df (14), NS). There were also 
no differences within groups in pre-test performance on the two discrimination tasks 
(DS: t = 0.74, df (7), NS; non-handicapped: t = 0.42, df (7), NS). 


Training trials. Table 1 shows the percentage of correct responses during trial- 
and-error and errorless shape training trials for both the DS and the non- 
handicapped children. Performance in the two training conditions differed 
significantly in the combined groups (t = 3.7, df (15), P < 0.005) and also in both 
groups taken separately (DS: t = 11.4, df (7), P < 0-0005); non-handicapped: 
t — 4.8, df (7), P « 0.005), with errorless training scores exceeding trial-and-error 
training scores in all three cases. 


T-tests comparing the trial-and-error training scores of children initially trained 
with errorless learning and children with no prior training revealed a significant 


TABLE 1 
PERCENTAGE OF CORRECT RESPONSES DURING SHAPE TRAINING TRIALS 





Errorless Tnraal-and-error 
Training Traming 
(rectangle) (oval) 
Initial training DS subjects 
1) 100 60 
Errorless 2) 100 46 
3) 100 66 
4) 100 33 
5) 80 40 
Trial-and-error 6) 85 46 
7) 100 33 
8) 100 40 
, Non-handicapped subjects 
1) 82 60 
Errorless 2) 94 60 
3) 89 46 
4) 100 100 
5) 100 66 
'Trial-and-error 6) 94 33 
7) 100 26 
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difference in favour of those who had experienced the errorless training prior to 
presentation of the trial-and-error training (t — 2.15, df (14), P « 0.025). A similar, 
if weaker, effect existed in both groups when analysed separately (DS: t — 1.47, 
df (6), P < 0.10; non-handicapped: t = 1.63, df (6), P « 0.10). No effect of order of 
presentation of training conditions on errorless scores was found although there 
seemed to be a trend, in the DS group only, in favour of better performance in 
errorless training in children with no prior experience of trial-and-error shape 
learning (t — 1.70, df (6), P « 0.10). 


Post-test trials. Table 2 shows the differential effects of the two training 
procedures on subsequent performance. Improvement in performance was 
calculated by comparing pre- and post-test scores. For the combined groups, 
performance improved more following errorless training (t = 1.78, (df 15), 
P « 0.05). This effect did not reach significance in either group, however, 
although it came closest to doing so in the non-handicapped group (t — 1.76, 
df (7), P « 0.10). 


There was no overall effect of order of training procedures on post-test 
improvement. Only the trial-and-error scores in the DS group varied significantly 
with order of presentation, with prior errorless experience enhancing subsequent 
trial-and-error performance (t — 1.96, df (6), P « 0.05). Improvement in perform- 
ance in the two non-handicapped sub-groups was roughly equivalent in both 
training conditions, irrespective of their order of presentation. 


TABLE 2 


SHAPE DISCRIMINATION — EFFECTS OF ERRORLESS AND TRIAL-AND-ERROR TRAINING ON SUBSEQUENT 
PERFORMANCE: PRE-/POST-TEST SCORE DIFFERENCES 





Errorless Training Trial-and-Error Traming 
Scores (rectangle) Scores (oval) 
Pre- Post- Diff. Pre- Post- Diff. 
Initial training DS subjects 
1) 0 5 5 2 5 3 
Errorless 2) 0 7 7 5 7 2 
3)4 1 3 2 7 5 
4)2 4 2 1 6 5 
5)1 2 1 ] 2 1 
Trial-and-error 6) 0 5 5 2 5 3 
7)1 5 4 2 1 -1 
8) 3 6 3 1 4 3 
Non-handicappéd subjects 
1) 1 5 4 4 7 3 
Errorless 2)3 6 3 2 6 4 
3) 2 6 4 2 3 1 
4) 6 7 1 7 7 0 
5)4 7 3 2 7 5 
Trial-and-error 6) 2 6 4 1 5 4 
72 7 5 2 2 0 
8) 1 6 5 3 6 3 
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Nonsense figure discrimination 

Training trials. Errorless and trial-and-error training scores are set out in Table 
3. Errorless training scores reliably exceeded trial-and-error training scores, both 
overall and in each group, although this difference in scores was less pronounced in 
the non-handicapped group (overall: t — 6.15, df (15), P « 0.0005; DS: t — 5.96, 
df (7), P « 0.0005; non-handicapped: t = 3.31, df (7), P < 0.01). 


TABLE 3 
PERCENTAGE OF CORRECT RESPONSES DURING NONSENSE FIGURE TRAINING TRIALS 





Errorless Trial-and-error 
training training 
(Wug) (Nim) 
Initial training DS subjects 
1) 94 40 
"Trial-and-error 2) 100 26 
3) 82 13 
4) 94 80 
5) 100 33 
Errorless 6) 94 46 
7) 100 40 
8) 100 86 
Non-handicapped subjects 
1) 100 33 
Trial-and-error 2) 100 100 
3) 94 66 
4) 100 73 
5) 100 33 
Errorless 6) 94 80. 
7) 94 53 
8) 100 73 


Comparison of trial-and-error training scores of children initially given the 
errorless task and children with no prior training showed no significant differences, 
either for the combined groups or for either group taken separately. Nor was there 
any effect of training order on errorless training scores, although, as in the shape 
task, a trend in favour of the DS subgroup given this training condition first, i.e. 
with no prior trial-and-etror experience, was present (t — -1.48, df (6), P « 0.10). 


Post-test trials. Since there were no pre-test scores, the differential effect of the 
two training strategies on nonsense figure discrimination was evaluated by 
comparison of post-test scores in the two training groups (see Table 4). Errorless 
learning proved superior in all comparisons, with the effect again more pronounced 
in the DS group than in the non-handicapped group (overall results: t — 3.6, df (15), 
P « 0.005; DS group: t = 3.2, df (7), P < 0.01; non-handicapped group: t = 2.18, 
df (7), P « 0.05). Although a trend in favour of initial errorless learning was evident 
in the post-test scores of both the non-handicapped and DS groups, the differences 
failed to reach significance, either overall or for either group. 
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TABLE 4 
DIFFERENCE BETWEEN NONSENSE FIGURE POST-TEST SCORES AFTER ERRORLESS AND TRIAL-AND-ERROR 
TRAINING 
Errorless Tnal-and-error 
training training Diff. 
Initial training DS subiects 
5 3 2 
Trial-and-error 2)7 0 7 
3)7 5 2 
4) 6 3 3 
57 0 7 
Errorless 6)6 0 6 
77 7 0 
8)7 7 0 
Non-handicapped subrects 
17 7 0 
Trial-and-error 2)7 7 0 
3) 6 3 3 
47 7 0 
5)6 3 3 
Errorless 6) 6 5 1 
76 4 2 
8) 7 7 0 
DISCUSSION 


This study aimed to investigate and compare the efficiency of errorless and 
trial-and-error methods in teaching discrimination skills to normal and mentally 
handicapped children. As would have been expected, the non-handicapped group 
performed better than the handicapped group under trial-and-error training 
conditions. This kind of training most closely approximates the conditions 
encountered in everyday, natural learning situations, situations in which the non- 
handicapped children, by definition, show superior learning skills. It was hoped, 
however, that the DS children might benefit more than the non-handicapped 
children from errorless training, in relative if not in absolute terms. Surprisingly 
perhaps, both groups of children appeared to benefit equally from errorless 
training, both during training and in the subsequently administered post-tests. While 
better training scores would have been expected almost by definition with errorless 
training, its enhancing effect on DS post-test scores is encouraging, particularly 
given that only one short training session was used. With larger groups and repeated 
training sessions, it would seem reasonable to hope for even stronger carry-over 
effects. 


From the above, it is clear that the two groups responded very similarly to 
errorless training but very differently to trial-and-error training. While the 
differential effect in favour of errorless training existed in both groups, it was more 
pronounced in the DS group, in both training and post-test score comparisons. 
Again, to some degee, this was to be expected. For whatever reasons — ability or 
motivational differences — the trial-and-error training scores of the non- 
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handicapped children would be predicted to be — and were — greater than those of 
the handicapped children, thereby narrowing the difference between training scores 
achieved under the two conditions. No such simple explanation can account for the 
differential effect on post-test scores. After trial-and-error training, post-test scores 
of the DS children were markedly inferior to those achieved by the non-handicapped 
children; after errorless training, however, post-test scores on both discrimination 
tasks matched those achieved by the non-handicapped children. The enhancement 
effect was particularly noticeable in the nonsense figure results, where errorless 
training doubled the post-test success rate of the DS children. 


Why should the differential effect of the two training conditions be so marked 
in the handicapped group? It was suggested in the introduction that motivational 
factors may adversely affect the expression of competence in performance in DS. If 
motivational factors do influence performance in handicapped children over and 
above cognitive limitations on their performance, it might be expected that initial 
successful experience with errorless learning, by increasing motivation to learn, 
might have a priming effect on subsequent performance; initial trial-and-error 
learning experience might perhaps have the opposite effect, adversely affecting 
performance and reducing motivation to perform to full potential on the second 
discrimination task, even with errorless training. No such within-session effect 
would be predicted in the non-handicapped scores. Given the superior ability and 
better balanced history of failure and success in this group, there is no reason to 
expect that their motivation to learn would be either reduced by trial-and-error 
experience or increased to any significant degree by errorless experience. If 
anything, experience of either form of training, it might be predicted, would 
beneficially affect subsequent performance, simply by increasing familiarity with 
the type of task, a straightforward practice effect. 


Some limited support for the motivational hypothesis can be found by 
comparing training and post-test results in the two training subgroups of each group 
of children. Order of presentation of the two training conditions did appear to 
influence scores although the effect was not a strong one and, when present, was 
sometimes to be found in both groups of children. In both shape and nonsense 
figure discrimination tasks, for example, training scores favoured both of the 
subgroups given errorless training first, i.e., the children with no prior trial-and- 
error experience of the target concept; these trends in favour of errorless training 
were not, however, statistically significant in either group of children on either task. 
For DS children only, a statistically significant effect of training order was present in 
scores on the shape task; on the nonsense figure task again only a trend in favour of 
prior errorless training was evident. While these results are not very strong, they are 
consistent: in all cases where a trend was evident that trend was, without exception, 
in favour of children given errorless training first; in all cases, the effect was either 
greater or present only in the DS children. 


Some further support for the suggestion that prior, unsuccessful trial-and-error 
experience of a learning task can negatively affect training outcomes can be found 
by examining the differential enhancement effects of errorless training in the two 
discrimination tasks. In the nonsense figure tasks, tasks on which neither group 
could have had any prior learning experience, the enhancement effect of prior 
errorless training was clearly demonstrated, with the DS group benefiting most from 
prior error-free experience. A far weaker enhancement effect of errorless training 
procedures was found in the shape discrimination task results, both in the training 
and post-test scores. All children were likely to have had some prior experience of 
shape teaching, either formal or informal. Given that neither group of children were 
proficient yet in shape discrimination, any such prior learning experience must, of 
necessity, have been of a trial-and-error nature. 
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Although showing similar levels of pre-test ability, learning histories prior to 
training must, however, inevitably have differed in the two groups of children, even 
if only in terms of length. Given the difference in general ability levels of the two 
groups, it seems reasonable to assume that in their lengthier pre-history, DS children 
would also have been exposed to a higher absolute and relative rate of failure than 
the non-handicapped children. This prior negative experience could perhaps account 
for the poorer training outcome with these discrimination tasks; prior experience of 
failure was actually being added to in the testing situation, further lowering 
expectation of success, even on the errorless learning task. 


Overall, then, the results suggest that errorless learning techniques may be of 
value both in themselves and by virtue of the priming effect they appear to have on 
subsequently presented trial-and-error learning tasks. The fact that trial-and-error 
training appeared to depress DS performance, even on an errorless task, underlines 
the importance of ensuring that expectations in the handicapped do not lie in the 
direction of failure. It is easy to see how constant failure due to intrinsic deficits in 
functioning could negatively influence general expectations of success in any 
oo situation, resulting ultimately in a form of learned helplessness (Seligman, 
1975). 


Previous research suggests caution, however, in the use of errorless training. 
Dweck (1975), for instance, has questioned whether the most effective way of 
dealing with the poor response of mentally-handicapped children to failure is to 
eliminate the possibility of failure: teaching the child to respond positively to failure 
would seem in the long run to be more productive. Errorless learning is, after all, a 
very artificial teaching strategy, bearing little relation to real-life learning situations. 
Over-use could lead to an unhealthy and counterproductive dependence on this sort 
of learning support. Research also suggests that, while effective in teaching specific 
discriminations, errorless learning does not generalise readily to other tasks or even 
to post-tests of the same task (Gollin and Savoy, 1968; Etzel et al., 1981). The 
principles of errorless learning are, moreover, difficult to apply to higher level, more 
abstract learning tasks. Errorless learning may, however, play a useful role in the 
progression towards these more advanced forms of learning. Instead of imposing a 
difficult learning task on an apprehensive learner, errorless learning, by changing 
the success/failure rate that would normally be experienced, can perhaps be used to 
teach the child that he/she can learn and that learning can be easy, hopefully thereby 
raising motivation to put this newly realised skill to further use. 


Some anecdotal evidence from the study reported here can be offered in 
support of the contention that it is important to avoid the establishment of a 
*'failure set" being generalised to areas of comparative or untested strength (see also 
Duffy, 1986). Patterns of performance in at least two of the DS children — children 
2 and 3, both 10-year-olds — suggested that the low scores achieved in the selection 
pre-test did not accurately reflect these children's current level of shape 
discrimination ability. When questioned at the end of the experiment, child 2 in fact 
readily admitted that he had deliberately underperformed ‘in the earlier parts of the 
experiment (although this had not been obvious in any way to the experimenter at 
the time). 


Similarly, two other DS children (also 10-year-olds) who were eventually 
excluded from the experiment, produced overly consistent incorrect responses in the 
selection pre-test; when offered tangible reinforcement for correct responses both 
changed to near perfect ability to discriminate the shapes in question (see also Byck, 
1969; Shaw, 1986). According to the teacher, in one case at least, this behaviour was 
likely to have been a deliberate tactic produced to avert the possibility of being 
presented with a more difficult task; this particular child was always reluctant to 
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admit to being able to perform certain tasks even when well within his abilities, 
apparently suspicious that good performance on an ‘‘easy” task would lead to being 
tested on more difficult tasks, where he was likely to experience failure. 


This reluctance to perform to full potential seemed to be a characteristic of only 
the older DS children tested. A 6-year-old DS boy who had to be excluded from the 
study made no attempt to hide his capabilities irf the selection pre-test. The older DS 
children, by contrast, rather than allowing themselves to be placed in a situation 
over which they might have little control, seemed to be imposing their own control 
over the situation from the start, their poor performance very much a case of 
*won't do" rather than ‘‘can’t do" (Koegel and Mentis, 1985). Whether this 
behaviour is characteristic only of DS or is a product of age in combination with 
mental retardation remains to be investigated. Two further mentally-handicapped 
but non-DS 10-year-olds were run through the experimental procedures; although 
differing widely in levels of discrimination ability, neither showed any tendency to 
underperform. A further larger-scale investigation of these aspects of DS 
performance would seem therefore to be worthwhile. 
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THE INCIDENCE OF AND INFLUENCES ON STRESS AND 
BURNOUT IN SECONDARY SCHOOL TEACHERS 


Bv SUSAN A. CAPEL 
(Bedford College of Higher Education) 


Summary. This study investigated the relationship with stress and burnout of eight 
selected psychological, organisational and demographic variables in secondary school 
teachers. Teachers (N — 78) from four secondary schools completed self report measures 
of stress, burnout, role conflict, role ambiguity, locus of control, and organisational and 
demographic variables. Regression and follow-up canonical correlation analyses 
indicated that six of the eight selected variables were significantly related to stress, total 
burnout, frequency and intensity of burnout, emotional exhaustion, depersonalisation 
and personal accomplishment subscales. Role ambiguity and locus of control explained 
most variance on stress and all burnout scales except burnout intensity and emotional 
exhaustion, which were best explained by number of years teaching experience. Overall, 
however, stress and burnout levels were found to be low. Theoretical implications of the 
study include identifying whether levels of stress and burnout increase during the course 
of the school year, and identifying variables which can be included in other studies. 
Practical implications of how to overcome factors leading to stress and burnout as 
identified in this study are also discussed. 


INTRODUCTION 


Stress has become an area of interest among researchers and practitioners in many 
fields during the past few decades. Selye (1946) described stress as the non-specific 
response of the body to any demand made on it to adapt. Not all stress is damaging 
to the body, and some stress is needed to promote growth. However, the word stress 
has generally become recognised as being associated with stress that is damaging to 
the body. When there is a perceived imbalance between situational demands and 
one's capability to respond to those demands in a particular situation where the 
perceived consequences are important, the individual may experience high levels of 
Stress. 


When job-related demands and stresses become excessive there can be many 
different possible reactions. Burnout has been identified as one type of chronic 
response to the cumulative, long-term negative impact of work stresses (Blase, 
1982). Daley (1979) defined burnout as a reaction to job-related stress that varies in 
nature with the intensity and duration of the stress itself. It may be manifested in 
workers becoming emotionally detached from their jobs and may ultimately lead 
them to leave their jobs altogether. The three most common symptoms of burnout 
have been identified as emotional exhaustion, depersonalisation and low personal 
accomplishment. 


Studies have indicatéd teaching to be a stressful occupation (Dunham, 1976; 
Kyriacou and Sutcliffe, 1987a; 1987b). Other studies have indicated a relationship 
between teaching and burnout. Among the factors identified in these studies as 
contributors to burnout are administrative practices — particularly support and 
encouragement from administrators (Lawrenson and McKinnon, 1982; Zabel and 
Zabel, 1982), personality and environmental factors (Nagy, 1982), role conflict and 
role ambiguity (Schwab, 1981; Westerhouse, 1979). 


Burned-out teachers give significantly less information, less praise, less 
acceptance of their students’ ideas, and interact less frequently with them (Mancini et 
al., 1982, 1984). Thus, stress and burnout have a negative impact on teachers and 
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the pupils they teach. The purpose of this study was an exploratory identification of 
factors related to stress and burnout in secondary school teachers. Personal 
(demographic), situational (organisational) and psychological factors were included 
in a multivariate approach to looking at this problem. 


* METHOD 
Subjects ; 
The sample included 78 full-time and part-time teachers employed in four 
British secondary schools. These teachers taught a variety of subjects, to pupils aged 
11-18 years. 


Design 

Eight variables were selected which had been identified in previous research as 
predictors of stress and/or burnout and which were not correlated with other 
selected variables, which would have caused high multicolinearity. These included 
personal factors of: total number of years teaching experience, number of years at 
present position, and how often school work was taken home to do. They also 
included organisational factors of: number of hours per week involved with extra- 
curricular activities, and number of different classes taught. Finally, they included 
psychological variables of role conflict, role ambiguity and locus of control. 


Scores for stress and burnout were used as the dependent variables. 
Relationships among these variables were examined using regression analyses. 


Measures 

The stress symptom scale used in the survey was that used by Kyriacou and 
Sutcliffe (1978a; 1978b) in their surveys of British high school teachers. This scale 
asks teachers to estimate how frequently during the school term they experience 17 
listed symptoms of stress. Answers are given on a five-point Likert type scale from 
never to many times a day. No validity and reliability measures are available for this 
scale, but it proved useful in research by Kyriacou and Sutcliffe and therefore was 
chosen for inclusion in the present study. 


Burnout was measured by the Maslach Burnout Inventory (MBI, Maslach and 
Jackson, 1981). The MBI contains a two-scale format, to determine frequency and 
intensity of burnout, as well as information on three subscales of emotional 
exhaustion, depersonalisation and personal accomplishment. Slight wording 
changes from ‘‘recipient’’ to *'pupil were incorporated in this study, to apply 
specifically to teachers. Iwanicki and Schwab (1981) confirmed the validity and 
reliability of this modified version of the scale. This is the most widely used and 
accepted instrument to test burnout in various populations within the helping 
professions. 


Role conflict and role ambiguity were measured by the role questionnaire of 
Rizzo, House and Lirtzman (1970). This comprises 14 items, 8 of which give 
information about role conflict and 6 about role ambiguity. The validity and 
reliability of this scale have been well documented (Rizzo et al., 1970; Schuler et al., 
1979; Schwab, 1981). This is a widely used scale for measuring role conflict and role 
ambiguity. 


Locus of control was measured by the Rotter Internal-External Locus of 
Control Scale (Rotter, 1966). This scale consists of 29 pairs of statements designed 
to elicit whether individuals assign responsibility to self or powerful others for 
outcomes in a variety of life situations. This scale is a widely used measure in 
psychological research in general (MacDonald, 1973) as well as in stress research 
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. (Lefcourt, 1971; 1976). The validity and reliability of the instrument have been well 
established (Rotter, 1966; MacDonald, 1973; Fielding, 1982). 


A brief questionnaire was included to gather data on demographic and 
organisational characteristics of the respondents. The specific factors included in 
this have been described previously. Е 
Procedures 

The questionnaire was mailed to individuals in four secondary schools who 
were instructed to distribute the questionnaires to all staff (N = 160), via their 
pigeon holes, in September, 1985. The contact teacher reminded all teachers who 
had not completed the questionnaire within three weeks to return it as soon as 
possible. The completed questionnaire was then returned to the researcher toward 
the end of autumn term. 

Of the 160 questionnaires initially mailed out, 78 (56-5 per cent) were returned. 
These were all usable, therefore were included in the study. 


Statistical analyses 

Descriptive analyses provided means and standard deviations for all 
independent and dependent variables. Three multivariate multiple regression 
analyses were used to determine the predictive influence of the demographic, 
organisational and psychological variables on (a) stress and burnout, (b) burnout 
frequency and intensity and (c) emotional exhaustion, depersonalisation and 
personal accomplishment. 


RESULTS 


Demographic and organisational variables 

Total number of years teaching experience ranged from 1 to 37 (m = 11-87 
years), and number of years at present position ranged from 1 to 27 (m = 4:53 
years). Number of different classes taught ranged from 1 to 16 (m = 6:79). Number 
of days on which school work was taken home to do ranged from 0 to 7 (m = 5:2 
nights), and the number of hours per week involved with extra-curricular activities 
ranged from 0 to 16 (m = 3-00 hours). 


Psychological, stress and burnout variables 

Role conflict and role ambiguity scores were generally low, compared to 
teachers in other studies (Schwab, 1981). For both scales, the possible range of 
scores was 1 to 7. A score of 7 indicated a high level of role conflict, but a score of 1 
represented a high level of role ambiguity. 


Possible locus of control scores ranged from 1 to 24. Higher scores indicated an 
external locus of control, and lower scores indicated an internal locus of control. 
Sixty-three per cent of the teachers in this sample tended to an internal locus of 
control, and 37 per cent tended to an external locus of control. 


Stress scores among the teachers in this sample were generally low (m = 2:16). 
However, this level was higher than Kyriacou and Sutcliffe (1978a; 1978b) found in 
two samples of teachers (m = 1:77 and 1-84, respectively). Possible range of scores 
was from 1 to 5. A score of 5 indicated a high level of stress, and a score of 1 
indicated a low level of stress. Specifically, 81 per cent of the teachers in this sample 
showed low levels of stress and 19 per cent medium levels of stress. 


Burnout scores for the sample were also generally low. Comparisons with other 
groups of teachers cannot be made because these other studies do not report total 
mean scores for the three subscales, but only indicate means for frequency and 
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intensity of the three subscales separately. However, these scores are low when 
compared to athletic trainers (Capel, 1986). The possible range of scores for total 
burnout was 0 to 7, with 7 indicating a high level of burnout. Fifty-one per cent of 
teachers in this sample indicated they were experiencing a low level of burnout, and 
49 per cent a medium level. None of the teachers was experiencing a high level of 
burnout. Similar results were found on all burnout subscale scores. Means, standard 
deviations and ranges of these psychological, stress and burnout variables can be 
seen in Table 1. 











TABLE ! 
PSYCHOLOGICAL, STRESS AND BURNOUT VARIABLES PROFILE: MEANS, STANDARD DEVIATIONS AND 
RANGE 

Variable Mean SD Low High 
Role confhct 2:88 0-94 1-5 4:75 
Role ambiguity 4:98 0-91 3:0 6:5 
Locus of control 11-57 4:2 4:0 20-0 
Stress : 2°16 0-41 1-53 3-12 
Total burnout 3.02 0-52 1-75 3-98 
Burnout frequency 2°67 0-49 1:64 3°82 
Burnout intensity 3:38 0-66 1-86 4-73 
Emotional exhaustion 2°78 0-975 0-78 4-56 
Depersonalisation 0-96 0-969 0-0 3:4 
Personal accomplishment 4:55 0-620 3:44 5:63 





Regression analyses 

The first multivariate multiple regression analysis was conducted with the eight 
predictor variables and the criterion variables of stress and burnout. A significant 
multivariate effect was obtained: Wilk's lambda = 0:40, Е (8, 49) = 3:47, 
P « 0-0001. Thus, these eight variables were predictive of stress and burnout. 
Standardised regression coefficients and canonical correlation analyses were used as 
follow-up measures. 


Standardised regression coefficients provided a measure of the relative extent to 
which each of these eight factors contributed to the prediction of each of the two 
criterion variables. The standardised beta weights for this analysis suggested that 
locus of control was the best predictor of stress, and role ambiguity was the best 
predictor of burnout. An external locus of control and higher role ambiguity were 
associated with higher stress and higher burnout, respectively. These scores are 
summarised in Table 2. ° 


To examine further the multivariate relationship between the predictor 
variables and the criterion variables of stress and burnout, a canonical correlation 
analysis was also conducted. In this analysis, only the first canonical correlation was 
significant (Rcl = 0:737, P < 0:0001). The redundancy index revealed that 38-3 per 
cent of the variance in the criterion variables was predicted by a linear combination 
of the predictor variables. A value of 10 per cent or greater is considered to be a 
significant amount (Pedhazur, 1982), therefore this represented a significant and 
meaningful amount of shared variance between the predictor variables and stress 
and burnout. 
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TABLE 2 


STANDARDISED REGRESSION COEFFICIENTS: EIGHT DEMOGRAPHIC, ORGANISATIONAL AND 
PSYCHOLOGICAL VARIABLES WITH STRESS AND BURNOUT 








Variable Stress Burnout 
. 

Years at present position —0-624** — 0-341 
Hours extra-curricular activities —0-255* 0-052 
How often school work taken home 0-072 0-175 
Years teaching experience 0-624** 0-312 
Number of different classes —0-135 —0-273* 
Role conflict 0-109 —0-065 
Role ambiguity —0-450** —0-419** 
Locus of control 0-636** 0-319* 


Scores indicate the relative importance to which the eight predictor variables contributed to each of the 
three criterion variables. 


*P «0:05 
**P «0:01 


Canonical loadings were calculated to find out the relative contribution of each 
of the variables within each variate set to the overall multivariate relationship. Of 
the criterion variables, the canonical loadings showed that stress contributed more 
to the canonical correlation than did burnout. Of the predictor variables, locus of 
control contributed most to the relationship of the two sets of variates. Loadings 
greater than or equal to 0.30 are considered to be significant (Pedhazur, 1982). 


TABLE 3 


CANONICAL LOADINGS FOR THE EIGHT PREDICTOR VARIABLES AND THE CRITERION VARIABLES OF STRESS 
AND BURNOUT 





Variable Loadings 





Predictor Variables 


Years at present position —0-395* 
Hours extra-curricular activities —0-999 

How often school work taken home 0-356* 
Years teaching experience —0-:316* 
Number of different classes —0-185 

Role conflict 0-507* 
Role ambiguity Й —0-474* 
Locus of control 0-543* 


Cnterion Variables 
Stress 0-898* 


Burnout 0-169 


Scores indicate the relative contribution of each of the variables within each variate set. 


*loading > 0-30 Rel 20:737 
Rc? =0: 544 
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Burnout, therefore, did not contribute to the relationship. Five of the remaining 
seven predictor variables contributed significantly to the relationship, excluding 
number of hours involved with extra-curricular activities and number of different 
classes taught. Table 3 shows the loadings for each of these variables. 


These results indicated that an, external locus of control, higher role conflict, 
higher role ambiguity, fewer years at present position, taking work home to do more 
frequently and having fewer years teaching experience contributed to higher levels of 
stress. 


The second multivariate multiple regression analysed these eight predictor 
variables with the criterion variables of burnout frequency and intensity. The 
multivariate analysis was found to be significant: Wilk’s lambda = 0-572, F (8, 53) 
= 2-10, P = 0:014. Thus, these eight variables were predictive of burnout 
frequency and intensity. 


Standardised regression coefficients indicated that role ambiguity best 
predicted burnout frequency and total number of years teaching experience best 
predicted burnout intensity. High role ambiguity was associated with experiencing 
burnout more frequently and fewer number of years teaching experience was 
associated with experiencing burnout more intensely. Table 4 shows a summary of 
these results. 


TABLE 4 


STANDARDISED REGRESSION COEFFICIENTS: EIGHT PREDICTOR VARIABLES WITH BURNOUT FREQUENCY 
AND INTENSITY 





Burnout Burnout 

Variable Frequency Intensity 
Years at present position —0-215 —0-399 
Hours extra-curricular activities — 0:049 0:131 
How often school work taken home 0-144 0:171 
Years teaching experience 0-099 —0-418** 
Number of different classes — 0-243 — 0:249 
Role conflict —0-011 — 0-129 
Role ambiguity —0-:438** —0-340* 
Locus of control 0-195 —0-339* 

*P «0:05 
**P < 0:0] 


Follow-up canonical correlation analysis indicated that only the first canonical 
correlation was significant, Rc! = 0:575, P = 0-014. The redundancy index 
showed that 25:7 per cent of the total variance of the criterion measures was 
predicted from a linear combination of the predictor variables. 


The canonical loadings showed that burnout frequency contributed relatively 
more to the canonical correlation than burnout intensity, although burnout intensity 
also made a significant contribution. Of the predictor variables, role ambiguity 
contributed most to the overall relationship of the two sets of variables. All of the 
other variables except number of hours involved with extra-curricular activities and 
number of different classes taught also made a significant contribution to the 
relationship. These loadings indicated that higher role ambiguity, higher role 
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conflict, taking work home to do more frequently, fewer years at present position, 
fewer years teaching experience and an external locus of control contributed to 
teachers experiencing burnout more frequently and more intensely. Table 5 shows a 
summary of these canonical loadings. 


TABLE 5 , 


CANONICAL LOADINGS FOR THE EIGHT PREDICTOR VARIABLES AND THE CRITERION VARIABLES OF 
BURNOUT FREQUENCY AND INTENSITY 


Variable Loadings 





Predictor Variables 


Years at present position —0:430* 
Hours extra-curricular activities 0:145 

How often school work taken home 0-452* 
Years teaching experience —0-422* 
Number of different classes — 0:260 

Role conflict 0-548* 
Role ambiguity — 0-708* 
Locus of control 0-390* 


Criterion Variables 





Burnout frequency 0-750* 

Burnout intensity 0-353* 
*loading < 0-30 Rel 20-575 
Rc?=0-331 


The third multivariate multiple regression analysis was conducted using the 
eight predictor variables, with the criterion variables of emotional exhaustion, 
depersonalisation and personal accomplishment. A significant multivariate effect 
was obtained: Wilk's lambda =0-366, Е (8, 53) «2:56, P < 0-0001. Thus, these 
criterion variables were predicted by the eight predictor variables. 


Standardised regression coefficients were calculated, which suggested that 
emotional exhaustion was best predicted by number of years at present position, 
depersonalisation by role ambiguity and personal accomplishment by locus of 
control. The scores for each of these variables are summarised in Table 6. Having 
fewer years at present position was associated with higher emotional exhaustion, 
higher role ambiguity was associated with higher depersonalisation and an external 
locus of control was associated with lower personal accomplishment. 


Two canonical correlations were significant, Rcl = 0:648, P < 0-0001; and 
Rc2 = 0-574, P = 0-032. These results indicate that there were two unique 
solutions for the predictive influence of these eight predictor variables on emotional 
exhaustion, depersonalisation and personal accomplishment. There is, then, a high 
percentage of shared variance between these pairs of linear scores, but they are not 
completely overlapping. Interpretation of these results is therefore difficult and 
limited. The redundancy indexes were found to be 16-2 per cent for the first 
canonical correlation and 14-3 per cent for the second, implying that 30:5 per cent 
of the total variance of the criterion measures was predicted from a linear combina- 
tion of the predictor variables. Individually, both indices also accounted for a 
significant and meaningful amount of variance. 
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TABLE 6 


STANDARDISED REGRESSION COEFFICIENTS: EIGHT PREDICTOR VARIABLES WITH EMOTIONAL 
EXHAUSTION, DEPERSONALISATION AND PERSONAL ACCOMPLISHMENT 














Variable Emotional Depersonal- Personal 
* Exhaustion isation Accomplishment 
Years at present position —0-529* —0-125 0-255 
Hours extra-curricular activities —0-206 0-079 0-459** 
How often school work taken home 0:244 0-030 —0-059 
Years teaching experience 0:516* 0-125 —0:344 
Number of different classes —0-182 — 0-183 —0-137 
Role conflict —0-216 0-007 0:187 
Role ambiguity —0-347** —0-294* —0-078 
Locus of control 0-516** 0-293* —0-524** 
*P «0-05 
**P « 0-01 


The first set of canonical loadings showed that personal accomplishment 
contributed relatively more to the canonical correlation than emotional exhaustion, 
although this also contributed a significant amount. Depersonalisation did not make 
a significant contribution to the multivariate relationship. Of the predictor 
variables, locus of control contributed most to the overall relationship of the two 
sets of variables. Number of hours involved with extra-curricular activities, role 
conflict and role ambiguity also made a significant contribution. These loadings 
indicated that an external locus of control, greater number of hours involved with 


TABLE 7 


CANONICAL LOADINGS FOR THE EIGHT PREDICTOR VARIABLES AND THE CRITERION VARIABLES OF 
EMOTIONAL EXHAUSTION, DEPERSONALISATION AND PERSONAL ACCOMPLISHMENT 











Loadings 
Variable Ist correlation and correlation 

Predictor Variables 

Years at present position 0-119 —0-400* 

Hours extra-curricular activities — 0-:468* 0-156 

How often school work taken home 0-032 0-487* 

Years teaching experience 0-074 —0-351*. 

Number of different classes 0-190 —0-233 

Role conflict —0-406* 0:437* 

Role ambiguity 0-368" ° —0-593* 

Locus of control 0-479* 0-484* 
Criterion Variables 

Emotional exhaustion 0:421* 0-906* 

Depersonalisation 0-085 0-688* 

Personal accomplishment 0-986* 0-064 
*loading < 0:30 | Rel =0-648 Re2 =0-574 


Кс =0:420 Ес’ = 0:330 
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extra-curricular activities, higher role conflict and higher role ambiguity were 
associated with lower personal accomplishment and higher emotional exhaustion. 


The second set of canonical loadings indicated that emotional exhaustion 
contributed more to the canonical correlation than depersonalisation, which also 
contributed significantly to the relationship. Personal accomplishment did not make 
a significant contribution to the relationship* Of the predictor variables, role 
ambiguity contributed most to the overall relationship of the two sets of variates. All 
the predictor variables except number of hours involved with extra-curricular 
activities, and number of different classes taught contributed significantly to the 
relationship between the two sets of variates. These results suggested that higher 
emotional exhaustion and higher depersonalisation are associated with higher role 
ambiguity, an external locus of control, taking work home to do more frequently, 
higher role conflict, fewer years at present position, and having fewer years of 
teaching experience. Table 7 shows a summary of these two sets of canonical 
loadings. 


DISCUSSION 


Care must be taken in interpreting results of this study because of the low 
number of subjects and low questionnaire return rate. However, some possible 
reasons for findings of low burnout in this sample could be that burned-out teachers 
have already left the profession or did not return the questionnaire, or that burnout 
does not really exist and the current interest in burnout could be creating a problem 
that is not really there. 


Results did, however, indicate that six of the eight independent variables were 
related to stress and/or burnout; number of hours involved with extra-curricular 
activities and number of different classes taught were not related. Individual 
teachers could assess each of these factors to determine how each relates to stress 
and/or burnout for him/her personally. Further, school administrators could assess 
the impact of these variables on burnout among staff at the school. 


Specific methods of reducing role conflict and role ambiguity might be 
implemented. Administrators could help to establish opportunities for teachers to 
feel in control of their own actions by involving them more in decision making on 
matters of concern for them. Further, taking work home to do less often could help 
reduce burnout. This would provide an essential break from school work and an 
opportunity to become refreshed and ready to start again. Thus, individual teachers 
as well as administrators can take action to help reduce stress and burnout produced 
by demographic, organisational and psychological variables identified in this study. 


At most 38:3 per cent of variance in stress and burnout was accounted for by 
the variables included in this study. This indicates that there are many other factors 
which are important in helping to explain these two factors. Furthermore, this study 
was undertaken in autump term, 1985. At this point in the school year teachers may 
not be experiencing high stress and burnout. A further study is being undertaken to 
determine whether stress and burnout levels increase during the school year, and 
what other factors may contribute to this. 


Correspondence and requests for reprints should be addressed to Dr. Susan A. Capel, 
School of Human Movement Studies, Bedford College of Higher Education, 37 Lansdowne 
Road, Bedford, MK40 2BZ, England. 
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A-LEVEL EXPECTANCIES AND UNIVERSITY ASPIRATIONS 
OF MALES AND FEMALES 


By KAREN JANMAN 
(Department of Psychology, University College, London) 


Summary. This study, utilising both an examination of the national statistics and a 
detailed survey of a smaller group of sixth form students, draws attention to the sex 
differences still found in expected and actual examination results. In gross numbers of 
students and passes, girls are likely to reach equality with boys up to A-level in the fairly 
near future. However, two areas of sex differences, the sex stereotyping of subject 
choices and the grade attainments of males and females, still represent sources of grave 
concern for anyone worried about inequality. In the future the real test will be whether 
the boundaries break down around the courses that are now still predominantly male or 
female in recruitment, and whether the more general sex-typing of higher education as 
masculine subsides sufficiently for females to both want, and expect, to equal the 
performance of their male peers. 


INTRODUCTION 


Brief review of the educational statistics 

Despite the Sex Discrimination Act of 1975, reports and research concerned with the 
educational experiences and attainment of both males and females continue to show 
that the final educational attainment of women typically falls behind that of men. 


The movement of females with age from a superiority to an inferiority relative 
to their male counterparts can be consistently traced through the literature. In 
general, reading ability appears to be superior in girls up to the age of about 11 years 
(Douglas, 1964; DES, 1978) whilst mathematical ability shows only a few 
differences, in spatial perception for example, although these latter results are less 
clear cut and frequently fail to reach statistical significance (see Wilkin, 1982, for a 
review of this literature). Most reviewers of the literature on infant and primary 
school children conclude that by halfway through their schooling there are no 
outstanding differences between the sexes in overall attainment in the two basic 
subjects of reading and mathematics. 


By the time pupils sit their Certificate of Secondary Education (CSE) and the 
General Certificate of Education (GCE) and O-level examinations, the specific 
subject specialisation which continues through their academic choices is very 
apparent. Despite 50:8 per cent of passes at O-level in 1984 being attained by girls 
(Statistics of Education, 1984), the strong sex differences in O-level subjects taken 
result in girls’ successes being generally in arts, social science and language subjects, 
compared with those in mathematics and pure sciences taken by boys. 


At this stage, and by these measures, girls are slightly more successful than 
boys, although the interests of both sexes lie within specific subject ranges. 


The proportion of the sexes staying on at school for two years to take GCE 
A-levels are approximately equal (girls 18 per cent, boys 19 per cent) but it is here 
that the most significant differences in educational attainment begin to appear 
(Social Trends, 1984). Fewer girls than boys (7 per cent vs. 10 per cent) attain three 
or more A-level passes. Thus, despite the expectation from O-level results that an 
equal or higher proportion of girls than boys would enter for A-level examinations, 
the opposite occurs. For example, in summer 1984, girls entered for fewer A-level 
subjects than boys and their results, pro rata, were slightly but definitely poorer 
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TABLE 1 
A-LEVEL RESULTS IN SELECTED SUBIECTS — SUMMER, 1983 











Subject Sex Мо. of All Passes as % Passes as a % of 
Entries Passes of entries entries 
• А B C D E 
English F 45,287 33,285 73-50 7-7 531 1042 166 20-0 
M 19,028 14,076 73-98 8:6 5-6 15:2 6:6 18:0 
Maths F 19,766 13,665 69-13 9-7 13:5 1:5 15-1 3 
M 47,626 32,156 67-52 13-3 13:5 10:7 13:3 16:8 
Further F 1,702 1,365 80-20 13:9 18:3 13:8 17:5 16:9 
Maths M 5,469 4,526 82-76 19-6 18:3 13:9 48 16:2 
Physics F 11,412 8,135 71:28 8-3 15 12: 15:1 20:2 
M 44,056 30,808 67-93 11-6 1 11:6 1 19:3 
Chemistry F 17,147 12,457 72:65 10-3 15:5 12:6 13:9 20-4 
M 30,645 22,960 74:92 13:3 16:4 123 14:0 19:0 
Biology F 27,413 18,785 68-53 8-8 13:4 10:5 14:9 209 
M 18,694 13,081 69-97 10:2 142 10-9 15:5 19:2 
Geography F 14,447 10,503 72-70 10:5 155 13:9 13:9 19-1 
M 20,583 14,543 70-66 9:1 146 12:9 143 19-6 
History F 20,441 13,788 67-45 7:3 11:7 16-5 20-4 
M 17,704 12,759 72-07 9:6 158 19:3 
French F 18,542 13,367 72-09 9-4 145 138 17:1 18:9 
M 6,627 4,988 75.27 12:6 15:8 14-1 174 15-4 
German F 6,489 5,191 80-00 14-0 170 13:3 16:2 14-2 
M 2,613 2,191 83-85 18-9 19.7 13:1 14:5 17-6 
Art F 16,698 12,808 76-70 6-5 15:4 173 25:3 
M 10,312 7,633 74-02 72 118 142 166 242 
Music F 2,966 2,190 73-84 9.5 15:1 12:6 5:7 21:0 
M 1,506 1,193 79-22 13-8 18:5 13:6 15:3 18-0 
Religious F 3,890 2,481 63-78 6:9 13.8 2:4 13-8 16-9 
Education M 1,484 953 64-22 10-7 13:8 0-9 12:2 166 
Domestic F 5,317 3,459 65-06 6:3 11:2 11:3 14:1 22:1 
Subjects M 40 24 60-00 2:5 25 12 17:5 17:5 
Technical F 95 63 66-32 6:3 8:4 10:5 28:4 
Drawing M 3,105 2,135 68-76 8:3 15:6 9-8 22-0 
Total for all Subjects Е 298,274 204,743 68-64 7T-9 13:2 12:3 15-3 19:9 
Taken, 1983 М 333,853 232,294 69-58 10-7 141 120 14-1 18:7 





[Source: Statistics of Education, 1983, Vol. 2.] 
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(Statistics of Education, 1984). In addition to their differential performance, the sex 
differentiation of subject choices are more marked at A-level than at O-level, with 
over twice as many boys as girls following science subjects. This disproportion is 
even greater if mathematics is included (see Table 1). 


The percentages of males and females entering degree courses at universities, 
polytechnics and other establishments reflect a similar, unequal distribution of the 
sexes. In 1982, 9-1 per cent of boys compared with 6-9 per cent of girls chose degree 
courses with relatively more boys (6-9 per cent) than girls (4-9 per cent) choosing a 
university education. Such findings have been reported most recently by Garratt 
(1986). She found that, whilst the majority of a sample of A-level students indicating 
that they hoped to go on to further education were females, within those hoping to 
continue their education, boys generally had higher aspirations than girls. This sex 
difference is reflected in the statistic that only 43 per cent of the total United 
Kingdom group of applicants for entry to university in 1984 were female (UCCA, 
1984). 


The role of expectancy in academic aspirations and achievement 

Two possible causes of lower achievement in women which are frequently 
studied are expectancy (of success in a given area) and interest or incentive. The 
latter has been well documented as an important determinant of academic choices 
(Eccles et al., 1984; Janman, 1985; Garratt, 1986) as well as occupational choices 
(Betz and Hackett, 1981; Janman, 1987) and will not be of primary concern here. 
Rather, the present study aimed to investigate the possible role of expectancy in the 
A-level choices and university aspirations of a sample of male and female sixth form 
pupils. Whilst a considerable literature exists in psychological journals and books 
concerning, often experimentally induced, expectancies over a variety of age ranges, 
no study known to the present author has considered expectancies for A-level 
success by British students. Following a summary of the literature relating to sex 
differences in expectancy, some empirical data will be described which sought to 
redress this balance. 


Past studies, using novel tasks, have generally found that girls who are 8 or 
more years old have lower initial expectancies of success than do boys (Crandall, 
1969; Dweck and Gilliard, 1975; Parsons and Ruble, 1977). In contrast, familiarity 
with the task has been found to influence sex differences in expectancy. For 
example, although Parsons and Ruble found an initial sex difference in the 
children's expectations, the sex difference disappeared after a series of successes at 
the task. Frieze et al. (1978) interpreted this pattern as reflecting the difference 
between specific and generalised expectation. They argued that girls’ generalised 
expectations are lower than those of boys, but that their specific expectations, like 
those of boys, are largely determined by performance history. It is their generalised 
expectations rather than specific expectations, however, that should have the more 
powerful influence on decisions regarding future achievements, and generalised 
expectations appear to be especially detrimental to girls! academic choices and 
achievements. For example, Meece ef al. (1982) describe how, because advanced 
mathematics differs from earlier mathematics courses and students perceive 
advanced mathematics courses as more difficult than their current mathematics 
course, generalised expectations would be the most salient. 


This distinction between generalised and specific expectations has also been 
emphasised by Lenney (1977). She criticises the conclusions of Maccoby and Jacklin 
(1974) that, 


**. . . college men are more likely than college women to expect to do well, and 
to judge their own performance favourably once they have completed their 
work” (p. 154), 
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arguing that they should have qualified their conclusion by discussing studies in 
which this sex difference was not found. In a reconsideration of the studies reviewed 
by Maccoby and Jacklin she shows how important situational factors, such as the 
degree of novelty or generality of the task, can affect the extent of the sex 
differences found in expectancy. 


Another situational factor which has been found to exert a strong influence 
upon sex differences in expectancy is that of the degree of sex-typing of the task. 
Both evidence reviewed by Lenney (1977) and that published more recently 
(McHugh et al., 1982; Gitelson et al., 1982; McHugh and Frieze, 1982) show that, 
when sex appropriateness of the experimental task is manipulated, females expect to 
do at least as well as males on ‘‘feminine’’ and some neutral tasks. This has also 
been shown in a more applied study of expectancies for various occupational choices 
(Janman, 1987). 


The aim of the present study was to investigate whether girls estimate their 
A-level performance with reference to their O-level results as much as do boys, or 
whether their lower generalised expectations depress their expectations for their 
A-level results. Since an important factor in decisions regarding whether or not to 
apply to university is likely to be the students’ perception of their own abilities, it 
may be argued that a possible reason for females deciding not to apply to university 
is that they do not anticipate attaining high enough grades. The study was designed 
to compare the past (actual) O-level grades with the future (predicted) A-level grades 
of male and female sixth form pupils and to investigate whether these differed as a 
function of the pupils’ sex and their intentions regarding university application. 


METHOD 
Subjects 

Pupils were taken from a mixed sex sixth form college in Oxfordshire. All were 
studying for at least one A-level subject, the maximum number of subjects studied 
being four. Since it was hoped to compare expectancies of those who had decided to 
apply to university with those who had not, pupils were studied in the first term of 
their second sixth form year, shortly after the university application (UCCA) forms 
had been distributed to them. It was the policy of the sixth form college to ask all 
pupils whether or not they intended to apply to university and to offer assistance 
both in coming to a decision and in the subsequent completion of the form should 
this be needed. All subjects had, therefore, given some thought to their intentions 
regarding future careers and all had been asked, shortly before taking part in this 
study, to come to some decision as to whether or not they wished to apply to 
university. 

A total of 189 pupils completed questionnaires, although seven were excluded 
from the analysis as a result of failing to indicate their sex and/or their intentions 
regarding university application. 

The questionnaires were filled in during general study periods and all second 


year pupils in the college were tested. A teacher was present during the completion 
of the questionnaires. 


Procedure and materials 
Each pupil was given a questionnaire which asked him/her to provide the 
following information: 
(a) their sex and age, 
(b) all subjects taken at CSE or GSE O-level together with the grades awarded, 
(c) all subjects studied to A-level, and the expected grade attainment, 
(d) intentions regarding university application. 
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It was stressed to the pupil, both verbally before completion of the 
questionnaires and in the questionnaire instructions, that all information provided 
would be treated confidentially. Anonymity was also maintained, no names being 
required on the questionnaires. 


Pupils completed the questionnaire in ordingry classroom conditions, although 
they were requested not to discuss their responses with their neighbours. All 
questionnaires were completed within 15 minutes. 


RESULTS 


The usable sample is described in Table 2 which gives the number of males and 
females according to their intentions regarding university application. 


TABLE 2 
SAMPLE DESCRIPTION AND MEAN SCORES FOR EACH VARIABLE 





MALES FEMALES , 
Applying Not Applying Applying Not Applying 

Number of subjects 64 31 42 45 
Mean no. of 

O-level passes 7.8 6-6 8-1 6:8 
Mean total no. of 

O-level points 40 30 41 31 
Mean no. of 

A-levels being studied 3-0 2:9 2-9 2:3 
Mean total expected 

no. of A-level 

points 14-2 11-0 12:4 7:7 
Mean expected 

A-level grade 47 3-8 42 3-3 


Thus, this sample consisted of almost equal numbers of males (N = 95; 52 per 
cent) and females (N — 87; 48 per cent). However, 67 per cent of males, compared 
with only 48 per cent of females, replied that they intended to apply for university 
admission. Females therefore comprised only 40 per cent of those individuals in the 
sample who expected to apply to university. This reflects the national statistics 
which show females to comprise only 43 per cent of the total United Kingdom group 
of applicants for entry to university in 1984 (UCCA, 1984). 


Table 2 also shows the mean number of O-level passes (grades A-C inclusive) 
attained by each of the four groups, males applying, males not applying, females 
applying and females not applying. The national statistics described earlier, which 
showed females to have more O-level passes than males, were similarly replicated 
here, with the females of this sample tending to have a greater number of O-level 
passes than their male counterparts. Since the sexes were also categorised according 
to their university intentions, it was considered useful to carry out a two-way 
analysis of variance on this date to investigate whether the number of O-level passes 
differed significantly according to sex, intention and/or an interaction between the 
two. Neither the effect due to sex nor the interaction effect reached statistical 
significance (P » 0-05). However, as might be expected, both males and females 
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who intended to apply to university had a significantly greater number of O-level 
passes than those who did not (Е, ,4,— 8:79, P < 0-01). 


Very similar results were obtained when the number of O-level points were 
calculated for each subject [such that A=6, B=5, C=4, D=3, E=2, and U=1] 
and entered into an analysis of variance. Again, whilst the trend was for females to 
have a greater total number of 'O-level points than males, this did not reach 
statistical significance nor interact with intention. Intention itself again showed a 
main effect, however, such that those intending to apply to university had a 
significantly greater number of points than those not intending to apply 
(Е, а= 18:28, P < 0-001). 


The percentage of the males and females who had taken selected O-level 
subjects are given in Table 3. The males in this sample were more likely than the 
females to have taken the traditionally masculine subjects of mathematics, physics, 
geography, chemistry and technical drawing. Females on the other hand were more 
likely to have taken the traditionally feminine subjects of English, biology, art and 
needlework. An interesting, although not unexpected, outcome was that those 
applying to university had taken more academically oriented subjects than those not 
applying, e.g., history, geography, sciences, mathematics etc., whilst the reverse was 
true for the ''craft" or less academically oriented subjects, e.g., art, religious 
education, needlework (for females) and technical drawing (for males). 


TABLE 3 
GCE O-LEVELS TAKEN IN SELECTED SUBJECTS BY SEX AND INTENTION, AS A PERCENTAGE OF TOTAL GROUP 


MALES FEMALES 
Subject Applying Not Applying Applying Not Applying 
English 95 92 99 93 
Maths 99 98 83 68 
Physics 97 82 73 61 
Geography 72 51 34 31 
Chemistry 92 82 72 72 
History 46 43 53 32 
Biology 62 43 92 71 
Art 17 21 46 51 
Religious Education 6 1 42 53 
Technical Drawing 22 45 0 0 
Needlework 0 0 26 42 


The sex-stereotyping of subject choices was not lifnited to O-level subjects. 
Similar differences were found between the A-level subjects chosen by females and 
males (Table 4). Again, males were more likely than females to take mathematics, 
physics and chemistry, whilst the reverse was true for English, biology, art, religious 
education and needlework. 


The remaining results and analyses to be described investigated sex differences 
in the numbers of A-levels taken and the grades expected by pupils. Table 2 shows 
the mean number of A-level subjects taken by pupils as a function of their sex and 
intention regarding university application. Whilst males applying and not applying 
and females applying differed little in the number of A-levels studied there was a 
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TABLE 4 
GCE A-LEVELS TAKEN IN SELECTED SUBJECTS BY SEX AND INTENTION, AS A PERCENTAGE OF TOTAL GROUP 








MALES FEMALES 
Subject Applying Not Applying Applying Not Applying 
English 33 44 67 57 
Maths 63 56 34 22 
Physics 62 52 35 7 
Geography 21 17 22 5 
Chemistry 47 32 22 14 
History 30 28 23 28 
Biology 21 13 28 37 
Art 10 33 20 21 
Religious Education 0 0 0 12 
Needlework 0 0 0 10 
French 4 0 7 12 


large decrease in the number studied by females not intending to apply. This is 
representative of the statistics presented in the introduction which showed the 
difference in A-level passes to exist at the ‘‘three or more subjects” level, which had 
more male than female members. Thus, whilst in this sample nearly all males (89 per 
cent) staying in the sixth form were studying three or more subjects, fewer females 
were doing so (72 per cent). This finding that females, especially those not intending 
to apply to university, take fewer A-levels than other sixth form pupils was tested in 
a two-way analysis of variance. 


Significant effects of sex (Е, ,4,11:56, P < 0:001) and a sexx intention 
interaction (Е, ,475:19, Р < 0-05) were obtained. Thus, a picture emerges of 
females, highly qualified in terms of numbers and grades of O-levels, who chose to 
study fewer subjects at A-level than their male counterparts, in mostly feminine sex- 
typed subjects. Many more females (28 per cent) than males (11 per cent) staying on 
at school for two years were taking two or fewer A-levels. 


Table 2 also shows the mean total expected number of A-level points (calculated 
for each subject such that A26, B=5, C=4, D=3, E=2, and U=1) and the mean 
expected average grade (the total expected number of A-level points divided by the 
number taken by that pupil) for each group. 


These show that for both males and females, those intending to apply to 
university expected a greater average A-level grade and a greater total number of 
A-level points. Given their slightly higher O-level grades described in the 
introduction, these higher expectancies do not appear unreasonable. However, 
within the distinction of whether or not they intend to apply to university, female 
subjects expected lower grades than did their male counterparts, resulting in a lower 
expected total number of A-level points. These effects were tested in the usual way, 
using two-way analysis of variance. Main effects of sex (Е, = 4°56, P < 0:05 and 
Е, 173= 8°21, P < 0-05) and intention (Е, „= 34:89, Р < 0-001 and F ,5,7 33:45, 
P « 0-001) were obtained for the mean number of A-level points and mean expected 
average grades respectively. 


Thus, the females in this sample, despite having received equally positive 
feedback from their O-level grades as the males, still had significantly lower 
expectancies when considering their future A-level grades. 
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DISCUSSION . 


The almost equivalent O-level grades of males and females described in the 
introductory review as well as the increasing numbers of females who stay on at 
school to take A-levels, have resulted in some optimistic conclusions being drawn by 
reviewers of the statistics. Wardle (1978) argued that what is happening is a process 
of equalisation from the bottom ир; with a balance being reached in about 1950 up 
to O-level, with considerable progress being achieved in A-levels and above. 


Such a view can certainly be supported by a reading of the A-level statistics 
since the number of passes obtained by girls has increased far more rapidly than 
those by boys. Between 1952 and 1965, boys’ A-level passes rose by 207 per cent and 
girls’ passes by 279 per cent. Between 1965 and 1976 they rose by 27 per cent and 71 
per cent respectively. Wardle argues that boys' results are approaching saturation 
while the girls’ continue to advance, and that it is hard to resist the conclusion that 
equality in the total number of passes must be reached in the fairly near future. 


Whilst such a conclusion points to the increasing participation of females in 
A-level education, it draws attention away from two important differences between 
the sexes which show very little evidence of decreasing. 


The first of these, the sex stereotyping of subject choices, has been strongly 
identified throughout this paper both at O- and A-level. The detailed breakdown of 
A-level entries and grades in Table 1 shows the prevalence of these differences as 
recently as 1983, even when females represented 47 per cent of all A-level entrants. It 
is important, therefore, not to simply talk of increasing numbers of females taking 
A-level examinations, but to qualify this by describing the subjects that they are 
taking. There still remain strongly established attitudes and expectations about 
courses appropriate to boys and girls, and these are none the less influential for 
being largely unspoken and frequently unconscious. These may be reflections of 
prejudices about the type of employment suitable for the two sexes and as such are 
likely to lead to serious consequences for the type of employment and the level of 
promotion and responsibility to which able girls can aspire. Schools, and those 
determining education policies, have some responsibility for perpetuating such 
attitudes, and for this reason a qualification is necessary to the statement that 
effective equality has been achieved up to about A-level standard. 


The second important difference which remains between the sexes is that of the 
actual A-level grades attained. It can be seen in Table 1 that for all subjects other 
than geography and domestic sciences a higher proportion of males than females 
obtained grade A in the examination. Looking at examination results over all 
subjects (total), a higher proportion of males than females attained grade A and B 
passes, whilst the reverse was true for grades C, D and E. This difference is obscured 
if, as is usually the case, only the proportion of passes is taken into account (68:6 
per cent for females and 69-6 per cent for males) Furthermore, this discrepancy 
between the sexes is not on the decrease, as it is for the number of A-level entrants or 
the number of passes. Figures 1 and 2 show that, if anything, this difference has 
increased over the recent past. 


When discussing the differences between males’ and females’ applications and 
admission to further and higher education courses, it is important to bear these 
results in mind. University application, for example, relies extensively on offers 
made conditional upon a certain A-level grade being achieved in a number of 
subjects by the applicant. Clearly, the pupils’ expectations, the teachers’ 
information concerning expected grades and the actual grades achieved will 
influence both the pupils' aspirations for, and success in, their applications. The fact 
that females receive an almost equal number of passes at A-level, but at lower 
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FIGURE 1 
PERCENTAGE OF MALES AND FEMALES ATTAINING GCE A-LEVELS, GRADES A AND B (1969-1983) 
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FIGURE 2 
PERCENTAGE OF MALES AND FEMALES ATTAINING GCE A-LEVELS, GRADES C, D AND E (1969-1983) 
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grades, should result simply in their gaining admission to those universities which do 
not demand grades above their attainment level. However, this is manifestly not the 
case. Far fewer females than males who are adequately qualified, even taking into 
account their lower grades, consider applying for university admission. This lack of 
motivation, therefore, appears more a cause than an effect of the lower performance 
of females. Despite having proven, and received feedback of, their abilities at 
O-level, females typically have lower expectancies and subsequently lower 
performance at A-level. 


For two reasons it seems likely that if there is a causal link between expectancy 
and actual results this comes from the former influencing the latter. Firstly, several 
empirical studies have found that when people are randomly assigned to high or low 
expectancy groups, the higher expectancy group tends to perform better than the 
lower expectancy group, a finding that suggests expectancy per se can influence 
actual performance (Tyler, 1958; Crandall, 1969; McMahon, 1973). Secondly, it 
does not seem that the lowered expectancies reflect a genuine lack of ability given the 
girls’ O-level results. 


Possible reasons for the girls’ lowered expectancies lie in the perception of 
school as ‘‘feminine’’ or ‘‘masculine’’ dependent upon level. Hutt (1979) describes 
the perception by both males and females of schooling as becoming more 
“masculine” as the academic level increases, e.g., primary to secondary school, 
O-level to A-level and so on. Thus, whilst success at O-level may be considered both 
appropriate and relatively likely for females, the same does not seem to be true for 
A-levels. Hutt describes how, particularly in mixed sex schools, the aspirations and 
attainments of girls are constrained by many factors, one of which is undoubtedly 
their own and their male peers’ expectations for their scholastic performance, their 
career performance and their domestic role expectations. 


The studies reviewed in the introduction illustrated the strong effects of sex- 
typing on expectancies and support the hypothesis that such an influence could 
affect expectancies at different levels of academic attainment. The strong prejudices 
held in British society, and consequently, by its adolescent population, serve to 
make higher education appear both less attractive and less attainable to the females 
within it. It appears that, at present, the educational process does little to ameliorate 
this situation for girls. In fact, with age, girls still show an increasing tendency to 
aspire, and to attain, less than boys. 
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TEACHERS' REPORTED PRACTICES TOWARDS GIRLS AND 
BOYS IN SCIENCE AND LANGUAGES 


By NORMAN WORRALL anp HELEN TSARNA 
(University of London Institute of Education) 


Summary. Fifty-three science and 55 language teachers, male and female, were asked 
about their classroom practices with respect to a typical or average ability 14-year-old, 
either a gir] or a boy. Questions ranged from the broad, e.g., amount of quality work 
expected, through the more specific, e.g., waiting time on difficult questions, to the very 
specific, e.g., frequency of smiling or eye contact. Up to eight of the 11 items showed a 
pattern where girls were relatively ‘‘favoured’’ in languages, and boys in science, though 
less markedly. The pattern generally held for both male and female teachers, and there 
was no evidence for boys or girls being favoured either overall or selectively by male or 
female teachers. Although science teachers expected lower achievement from girls, this 
appeared not to influence their teaching disposition adversely. Both the situation for girls 
in science and the less understood situation for boys in languages seem usefully 
elaborated by such classroom practice indicators. 


INTRODUCTION 


THERE is nowadays diverse evidence for gender differences in academic interest and 
achievement patterns (Eccles and Blumenfeld, 1985). It is well known, for example, 
that during the years of secondary education girls generally prefer arts subjects while 
boys prefer sciences, a fact which is reflected in this country as a distinct pattern in 
entries for school-leaving examinations (Mahony, 1985). 


In a representative study, Kelly (1981) considered three kinds of explanatory 
hypotheses for this pattern. The first hypothesis was cultural, or the idea that society 
neither encourages nor expects girls to achieve as well in science. The second was 
attitudinal, implying that girls might perform less well in science because their 
related attitudes are less positive, so that they enjoy science less, work less hard at it 
and accordingly achieve less. Kelly's third hypothesis concerning possible school 
factors is the one most relevant to the present study. She proposed that science might 
be presented in a way somehow less suited to girls, so that change in teaching style or 
a move to single-sex schools could improve girls’ achievement. 


Kelly (1981) noted that some gender differences already existed at 10 years, and 
Matyas (1982) found 9-year-old girls reported less experience with science activities 
although their interest was higher than that of boys. There are, in fact, considerable 
data on relative treatment of boys and girls in primary classrooms, although because 
science figures less in the primary curriculum the contrast has usually been between 
mathematics and reading. It has been found, for example, that teachers of 7-year- 
olds give more time and attention to girls on reading, but more to boys on 
mathematics (see Good and Findley, 1985, for review). 


Irrespective of curriculum subject, Dweck in North America has shown that 
girls receive more positive feedback than boys for non-academic behaviours such as 
quietness and neatness, while boys are both more reinforced for academic 
behaviours and generally receive more attention of all kinds (Dweck ef a/., 1978). It 
has also been reported (Goebes and Shore, 1975; Clift, 1978) that at primary level 
both male and female teachers regard girls more favourably than boys, a fact that 
teachers themselves seem unaware of (Serbin et al., 1973; Guttentag and Bray, 
1977). 

300 


NORMAN WORRALL AND HELEN TSARNA 301 


However, evidence from secondary school suggests that teachers do an about- 
turn, preferring to teach boys in the belief that male pupils are more interesting and 
critical, and that their education is more important (Ricks and Pyke, 1973). Spear 
(1985) found there was good agreement between male and female science teachers in 
attitudes towards students: both felt science was less important for girls and held a 
similarly negative attitude to girls on orientation to science. Extending the 
comparison to teacher behaviours, Crossman (1984) found good consistency 
between male and female teachers in the pattern of boy favouring in sciences. 


Evidence from secondary school settings is actually rather little (Brophy, 
1985b). One significant contribution is the work of Eccles (Parsons) and her 
colleagues in the United States who have been concerned for a number of years with 
the question of why mathematics achievement of girls declines relative to that of 
boys across the age range of 10 to 18. Eccles et al. (1984), for example, find 
relatively strong support for the perceived value of mathematics or English as a 
mediator of gender differences in actual and planned subject choice, with a possible 
contribution from differences in ability-effort attribution patterns. 


Three studies on gender and science have taken the classroom as their point of 
reference. Galton (1981) first identified teacher behaviour clusters representing three 
teaching styles — Problem Solving, Informing, and Enquiring — and then 
attempted to relate these styles to child attitude and attainment scores. None of the 
attitude scales seemed to favour any single teaching style. However, one emerging 
feature was that girls disliked the problem-solving style heavily favoured by physics 
and chemistry teachers and preferred the informing style (transmissive-factual with 
minimal discussion) prevalent among biology teachers. 


Two studies have looked specifically at gender and science in terms of 
interaction patterns in the classroom. Morse and Handley (1985) found that for 13- 
and 14-year-olds, boys both initiated and received more teacher-pupil interactions, 
and were asked more instructional-type questions. Feedback for girls tended to be 
either of the restatement-clarification type or affective, rather than the more 
instructional feedback given to boys. Crossman (1984) found data reasonably 
consistent with these. She looked at five classes of 13-year-olds and their 10 science 
teachers, using the Flanders category system (FIAC) to classify interactions. Again 
there was a pattern of boys answering questions more often — even though not in 
fact asked more often — and presumably following from this, of boys' ideas more 
often being accepted and developed. Feedback to girls was less contingent on 
intellectual performance. Moreover, as already noted, teacher gender seemed not to 
be a strong factor. 


The foregoing review indicates that boy-girl differences in classroom treatment 
probably exist but vary in terms of school level and curricular subject. Further, 
although there are reasonable amounts of data on differences in teacher attitudes, 
information on classroom practices is relatively slight. As far as we know, other 
than the studies by Morse ánd Handley (1985) and by Crossman (1984) there is little 
direct evidence of possible differences in the kinds of transactions that occur 
between boys and girls and their teachers in science and maths, and none for 
languages and arts. 


Unlike the issue of girls and science, little attention has been given by 
researchers to exploring boys’ underachievement in languages, especially in foreign 
languages. Instead, research has focused on boys' underachievement in reading and 
other verbal skills in the early elementary years (Maccoby and Jacklin, 1974). While 
the concern of the present study was also with ‘‘girls and science", equal emphasis 
was placed on the symmetrical question of '*boys and languages” so that a more 
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balanced picture of the interaction between child gender and curriculum subject 
might be obtained. 


Behind the present study is what might be called a transmission hypothesis. 
Given that gender differences exist across curricular subjects, as evidenced by a 
variety of criteria, would it be possible to identify relatively fine-grained 
transactions that occur between teacher and pupil, which could be the vehicles for 
perpetuating such differences? Indicators offering insight of this kind might be 
teacher estimates of waiting time for a boy or girl to answer a question, how often 
the teacher expects to call on a given child, and expects that child to answer, how 
long the teacher claims to wait before praising or criticising, and how much 
corrective feedback is said to be offered. 


METHOD 

Subjects 

Sixteen schools were involved, all coeducational and within the Greater London 
area. Contact with teachers was made through part-time students at the Institute of 
Education who were also teachers in these schools. These students had no 
knowledge of the actual research question being pursued. Prospective subjects were 
told we were conducting a study into the neglected question of the classroom 
experience of the average ability child. Over 90 per cent of teachers approached 
agreed to participate: 53 science teachers (28 male: 25 female) and 55 language 
teachers (25 male: 30 female). (Science was defined for present purposes as physics 
or chemistry, and languages as English or French.) Since it was possible that the 
research might enter into staffroom conversations, a given school received only boy 
or girl forms, assignment being on an arbitrary basis, so that eight schools received 
boy forms and eight girl forms. It follows that individual teachers would be asked 
about the case of a girl or a boy: this carried the additional safeguard of not 
sensitising teachers to the comparative nature of the study, with attendant risks of 
distortion. 


The questionnaire 

The questionnaire was prefaced by a cover sheet which had a preamble about 
how research overconcentrates on high and low ability children, and how we were 
interested in the average child, together with instructions for answering the 
questions to follow. The instructions proposed that in order to make answers as real 
as possible ‘‘let us assume that this average-ability child is some boy, aged 14 or 15, 
of normal demeanour and with no home background problems’’. The cueing word 
was evidently ''boy" ог ‘‘girl’’, and the 14-15 age was chosen following Kelly (1981) 
on the grounds that most pupils of this age have had some formal science. 


The second sheet was the questionnaire itself, and Table 1 shows the items, 
together with associated 10-point scales. It can be seen that questions are largely 
derived from research findings on ability-related discriminatory practices of the kind 
tabulated by Brophy (e.g., Brophy and Good, 1974; Brophy, 1985a) but here 
translated into process questions. For example, the reliable finding that ''teachers 
wait longer for smarter children"! could be translated into a process question such 
as, ‘‘Imagine yourself asking a typical girl in your class a difficult question: how 
many seconds would you wait for an answer?"'. By casting the study as being about 
the situation of *'the average child neglected by research’? we hoped to fade out 
emphasis on ability per se and use the questions to get at gender differences. 
Questions 1, 2 and 3 were aimed at advice given and standards expected, but the 
bulk of the questions were designed to be diagnostic of classroom practices, ranging 
from how often the teacher would call on the child to answer, to the very particular 
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level of how often they would expect to smile or make eye contact with this child. 
Each question repeated gender cueing so that the boy-girl contrast was maintained 
across questions. 


Questions were loosely grouped and presented either as in Table 1 or in the 
reverse order. To minimise further question-order effects, the instructions 
encouraged teachers to read through the questions several times, then answer the 
questions in any order they wished. 


TABLE 1 
QUESTIONS ASKED IN THE GIRL CONDITION 


1. Assuming one substantial homework per week, how many pieces of good (not merely satisfactory) 
work would you expect from the average ability girl over the whole term? 
12345627 8 9 10 (homeworks) 
or less or more 


2. How often over the term would you expect to advise her individually on academic matters? 
12 3 4 5 67 8 9 10 (times) 
or less or more 


3. Roughly, what proportion of average-ability children (assume girls for consistency) do you 
encourage to think in terms of a career related to your subject when they show some interest? 
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 


4. Imagine yourself asking her an easy question; how long would you wait for her to answer? 
109 87 6 54 3 2 1 (seconds) 
or more or less 


5. And how many seconds for a difficult question? 
1234526278 9 10 (seconds) 
or less or more 


6. If she gave a completely wrong answer to a routine question, how many such answers would you 
accept before openly criticising her? 
10 9 8 7 6 5 4 3 2 1 (answers) 
or more or less 


7. And 1f she gave a completely correct answer to a routine question, how many such answers before 
openly praising her? 
1234 5 6 7 8 9 10 (answers) 
or less or more 


8. You find she makes a recurrent minor error in the process of standing and presenting some 
material in class. How many times would you offer feedback in the hope of correction? 
109 8 7 6 5 4 3 2 1 (ите) 
or more or less 


9. How often would you expect her to raise a hand to answer questions during a typical class? 
10 9 8 7 6 5 4 3 2 1 (times) 
or more or less 


10. How often would you call on her individually during a typical class? 
12 3 4 5 6 7 8 9 10 (answers) 
or less or more 


11. How often do you think you would smile at or seek eye contact, with this average girl during a 
typical class? 
109 8 7 6 5 4 3 2 1 (times) 
ог more or less 


Note: Gender-cueing words were changed to suit the Boy condition. Actual question order was: 2, 9, 10 
11, 4, 5, 6, 7, 8, 1, 3, or the reverse. 
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RESULTS 

Design factors were Pupil Gender; Teacher Gender, and Curriculum Subject 
(science/languages). A multivariate analysis of variance over all questions yielded 
several points of interest. First, as Table 2 shows, Pupil Gender as a main effect is 
not significant. Thus there is no overall ‘‘favouring’’ of one gender against the other 
on these 11 items. However, Curriculum Subject differences emerge as highly 
significant overall. Moreover, the Pupil Gender by Curriculum Subject interaction 
which carries the main research hypothesis also shows significantly. It can also be 
seen that Teacher Gender differences are operative both as a main effect and in 
interaction with the subjects they teach. Pupil Gender by Teacher Gender is not 
significant, so that not only is there no evidence for teachers favouring girls or boys 
in general, but neither can it be said that, for example, female teachers generally 
favour girls or male teachers boys. Finally, there is a marginally significant (P — 
0:06) three-way interaction. 


We now treat the univariate analyses for each question, referring back to the 
multivariate index to qualify any significant univariate F values, as necessary. For 
each question, univariate F values are given under the corresponding panel in Figure 
1. The interaction of focal interest — Pupil Gender with Curriculum Subject — is 
given in every case, together with any other effects that are significant. 


TABLE 2 
SUMMARY TABLE FOR THE MULTIVARIATE ANALYSIS OF 
VARIANCE 
Source df F P 
Curriculum Subject (CS) 11 5-13 0-001 
Pupil Gender (PG) 11 0-89 0:54 
CSxPG 11 3:10 0-001 
Teacher Gender (TG) 11 2°54 0-01 
CSx TG 11 2-06 0-03 
TGxPG 11 1-23 0-35 
CSx TGXxPG 11 1-82 0-06 
Error 90 


Note: Significant sources of variance in the univariate analysis 
for each question are listed under the corresponding panels in 
Figure 1. 
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FIGURE 1 


Each panel shows results for the corres- 
ponding question. Under each panel is given 
the main F value of interest (Pupil 
Gender x Curriculum Subject) together with 
any other values that are significant. See 
Table 1 for complete wording of question and 
applicable units for the ordinate. 


Note: РС= Pupil Gender; CS = Curriculum 
Subject; ТС = Teacher Gender; B= Boys; 
G-Girls; M=Male Teachers; Е = Female 
Teachers. 
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Question 1 asked how many pieces of good work would be expected from the 
average boy (girl) over the term; Question 2 asked how frequently the average boy 
(girl) would get academic advice, and Question 3 asked what proportion of average 
boys (girls) would be given career encouragement. 


The corresponding panels in Figure 1 show that the pattern for all three 
questions is similar and provides some degree of common corroboration. Moreover, 
as indicated under each panel, in each case only one effect is significant, and that is 
the key interaction between Pupil Gender and Curriculum Subject. Higher 
standards are expected of boys than girls in sciences, but of girls than boys in 
languages. Boys get more advice than girls in science; girls more advice than boys in 
languages. Similarly, more girls get career encouragement in languages, though the 
boy-girl contrast in sciences is negligible. 


The effect is quite precise. Within none of the three questions is there a general 
difference between science and language practices, or between practices to boys and 
girls in general, or a different practice by male or female teachers: the effect is 
confined to one of relative boy-girl treatment within sciences and languages and by 
all teachers. 


A major concern of the study was whether differentiation could be traced down 
to more particular levels as represented in specific classroom practices. Question 4 
concerned waiting time for easy questions. In Figure 1, Panels 4a and 4b show this 
was not a discriminator, since there are no significant contrasts other than the three- 
way interaction. It appears from Panel 4b that whereas science and language 
teachers agree reasonably well about girls, they claim strikingly different practices 
for boys. We have no auxiliary theory to handle such a finding. Waiting time may in 
general not be a very revealing indicator, since although the crossover pattern does 
emerge for the case of difficult questions (Panel 5a), the supporting F value is only 
marginally significant. The feature that does emerge is that a// female teachers claim 
to wait longer than male teachers for answers to a difficult question (Panel 5b). 


Question 6 asked how many wrong answers to routine questions would be 
accepted before criticising. In Panel 6a, the significant Gender by Curriculum 
Subject interaction substantiates the crossover pattern to the effect that more 
tolerance is claimed for boys in science and relatively more for girls in languages. 


There are other subsidiary findings for Question 6 which are worth bringing 
out. There is a significant science vs. languages main effect for readiness to criticise 
in Panel 6b, and it is also clear that science teachers present themselves as more 
tolerant. The significant interaction between Teacher Gender and Curriculum 
Subject can be seen to be mainly in male language teachers’ claiming more tolerance 
of wrong answers than their female counterparts. 


Question 7 asked about delay before praising. The key Pupil Gender by 
Curriculum Subject interaction is again significant (Panel 7a) and the graphs show 
an identical pattern. Thus girls in science are criticised and praised earlier (while 
boys are of course criticised and praised later). In languages on the other hand, girls 
are criticised and praised /ater (while boys now receive both earlier). This pattern is 
discussed later in the context of the general results. 


On subsidiary findings, Panel 7b shows there is a significant tendency for male 
language teachers to praise more readily than female language teachers, with a slight 
reversal of this in science. Thus, female language teachers both criticise and praise 
somewhat sooner than their male colleagues. 


Question 8 asked about amount of corrective feedback offered. As Panel 8 
shows, these data also conform to the crossover pattern but individual ratings are 
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evidently too variable since the interaction for Pupil Gender with Curriculum 
Subject is far short of significance. 


Questions 9 and 10 concerned frequency of expected hand raising and of 
actually calling on a child to answer a question. The key Pupil Gender by 
Curriculum Subject interactions terms are again significant for both questions. 
Panel 9 shows that boys are perceived as raising their hands relatively more by 
science teachers, whereas girls are correspondingly perceived by language teachers 
— again the crossover relationship. Also evident in Panel 9 is a general reported 
difference between science and language classes in the extent to which hand-raising 
to answer a question is expected. 


For calling on children to answer, Panel 10a shows that the Pupil Gender with 
Curriculum Subject interaction is again significant. However, the main contrast is in 
languages, where girls are clearly thought to be called on more than boys. The 
significant three-way interaction (Panel 10b) also cautions against a too general 
interpretation here. 


The last question, Question 11, concerned empathy or non-verbal rapport 
(smiling and eye contact). Panel 11 suggests girls are viewed as favoured in 
languages, with no gender difference at all in science. However, the required Pupil 
Gender by Curriculum Subject is this time not significant. There is, however, a 
curricular difference, with language teachers representing themselves as showing 
more rapport than science teachers. 


DISCUSSION 


The general hypothesis that teachers’ reported classroom interaction practices 
would reveal relative favouring of boys in science, and relative favouring of girls in 
languages was supported with some qualification on up to eight of the 11 indicators 
examined. Six indicators clearly showed the required crossover interaction. These 
were: quality of work expected (Q1); academic (Q2) and career advice (Q3); delay 
before criticising (Q6) and before praising (Q7), and perceived frequency of 
answering questions (Q9). Two other indicators were more equivocal: calling on to 
answer questions (Q10), where the three-way interaction limited generality, and 
waiting time for difficult questions (Q5), where the significance was only marginal at 
P=0-07. Data for the remaining three indicators concerning non-verbal rapport 
(Q11); amount of corrective feedback given (Q8) and waiting time on easy questions 
(Q4) showed similar data patterns to those of the other indicators but crossover 
interactions were not significant. 


One particular qualifying comment is that pupil gender differences were 
relatively stronger on the languages than on the science side, as inspection across 
Figure 1 makes clear. For example, language teachers — and female teachers in 
particular — say they call on the average 14-year-old girl to answer almost twice as 
often as they call on the average boy, and also claim to give nearly twice as much 
career encouragement. Thus, although boys are to some extent favoured by all 
science teachers, girls are at least as much favoured by language teachers. 


It is also possible to pursue second-order implications within the present results. 
For example, if boys (or girls) were to be criticised later and praised later, this would 
indicate a relative lack of teacher concern. A different conjunction, *'criticised 
earlier/praised later" would indicate teacher antagonism to that group, while the 
converse, ''criticised later/praised earlier" would indicate relative favouring. In 
fact, the pattern for girls in science is none of these. Rather, science teachers claim 
girls are criticised earlier, but also praised slightly earlier — a pattern implying more 
shaping than that received by boys and consistent with positive teacher concern. 
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Moreover, boys even more clearly receive this same early praise-criticism pattern in 
languages. Taken together, these results suggest that teachers believe they respond to 
manifest achievement appropriately with either a close or a light teaching style. 


Consistent with this interpretation, science teachers said they gave girls more 
time to answer difficult questions. This, in turn, can be seen as acceptance of and 
positive response to relatively lower ability, whereas waiting a shorter time would 
have implied being less patient. Further evidence of a positive disposition from 
science teachers is that even though they expect a lower quality of work from girls 
(Figure 1, Panel 1), actual differences in the way they say they treat boys and girls 
are often small or zero — see frequency of calling on or an answer (Panel 10a), 
offering corrective feedback (Panel 8), waiting time on easy questions (Panel 4a) and 
amount of non-verbal rapport (Panel 11). 


In their study, Morse and Handley (1985) were concerned with amount rather 
than latency of feedback, but the present pattern is consistent with their finding that 
girls received more praise or criticism feedback in science than boys. However, 
Crossman's (1984) observations, consistent with those of Dweck ef al. (1978), 
indicated that girl praise in science lessons was more often contingent on non- 
intellectual considerations such as neatness. It is further possible that even such 
intellectually relevant praise as is received may be qualitatively different — slightly 
more patronising, for example (cf. Brophy, 1981). Simultaneous analysis of 
feedback incidents in terms of all four criteria: frequency, quality, latency and 
contingency might provide the best insight into girls! experiences in science classes. 


As indexed by these measures, boys and girls do not have generally different 
classroom experiences, since in Table 2 the multivariate test for Pupil Gender 
showed differences were only at chance level (P=0-54). The similar non- 
significance for the Teacher Gender by Pupil Gender interaction gave no evidence 
that male or female teachers were generally favouring girls or boys. Such a finding is 
consistent with the conclusions of Brophy (1985b) as well as with the data of Spear 
(1985) to the effect that all science teachers held the same relatively positive view of 
boys and negative view of girls. Nor could it be here that male science teachers were 
favouring boys, and female language teachers favouring girls: only on two items 
could the possibility of this be seriously considered. Thus the effect found is quite 
closely specifiable to be about girls-in-science, regardless of teacher, and boys-in- 
languages, again regardless of teacher. 


We would summarise the view offered from the present study as follows. While 
science teachers in general expect relatively lower achievement from the typical girl, 
their self-reported responses can be interpreted as one of concern to do something 
about it. Within such a view, it is difficult to see science teachers as guilty of 
somehow compounding gender biases that have arisen in society at large. 


Although the pattern of findings in the present study is broadly consistent with 
studies that have used different indicators, it is worth asking how much generality 
may reasonably be claimed. The device was adopted of asking about an ''average" 
14-year-old. This, as noted, had the advantage of focusing on the typical rather than 
the special-case child, as well as fading out concern with ability as such and allowing 
gender to be salient. Nevertheless, commonsense caution is indicated in applying 
conclusions to extremes of the age or ability range. Again, school selection was not 
random, but none of the 16 schools approached declined to participate, returns 
from teachers were high at 90 per cent, and the schools themselves appeared 
representative of urban, mixed-sex comprehensives. 


A more explicit gender treatment might have yielded more general or larger 
differences. On the other hand, the fact that some items did not discriminate may be 
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taken as evidence that teachers were giving selective and thought-out answers, and 
were not operating under a ‘‘gender halo” for whatever question was asked. 


In the end, the present ‘‘self-interview’’ approach can only argue on the basis of 
what teachers reconstruct then report in response to the questions asked. Accuracy 
of recall or social desirability may bias some of these reports, but this itself is not 
necessarily serious. Findings would only be compromised were it to be shown that 
any such bias was systematically different for one gender or for science versus 
language teachers. 


Consistent with Kahle (1982), claimed practices in the present study provide 
little evidence of same-gender special relationships between teachers and pupils. The 
Pupil Gender by Teacher Gender interaction was not significant in the overall 
analysis, and it is even possible on individual questions to find cross-gender patterns 
of male teachers favouring girls and female teachers favouring boys. More 
especially, the responses of science teachers indicate not so much a turning away 
from girls but rather a positive responding to perceived shortcomings. As far as the 
question of encouragement to pursue a science career was concerned, they reported 
treating girls and boys alike. However, it may be relevant that although the empathy 
item (Question 11) showed no gender differentiation, science teachers as a group saw 
themselves as putting out fewer empathy signals, relative to their language teacher 
counterparts. Since females seem more responsive than males to such signals (e.g., 
Kleinke, 1986) this could be a contributory reason why girls do not easily model on 
science teachers — of either sex. 


Correspondence and requests for reprints should be addressed to Dr. Norman Worrall, 
University of London Institute of Education, CDEP, 24-27 Woburn Square, London, 
WCIH OAA. 
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Summary. How does the structure of a database influence the user's organisation of 
information within it, and the user’s retrieval of information from it? Three experiments 
investigated how young children (9-11 years of age) organised and retrieved words froma 
number of structures. Subjects were given sets of 10 to 12 words and asked to organise 
them on paper (Experiments | and 2) and to extract data from a pre-organised set of 
words (Experiment 3). Each word set had a pre-defined organisation based on the 
semantic relationship between the words. The four organisations used in the three 
experiments were: lists, hierarchies, networks and table structures. Experiment 1 showed 
that young children have severe difficulties in organising information into hierarchies and 
even more difficulty with tables. Lists were the most common form of data organisation 
used even when that organisational structure was not the most appropriate. A few 
children produced idiosyncratic organisations, possibly due to a failure to recognise the 
semantic relationship among the words. In Experiment 2, while some children could fit 
the word sets into ‘‘skeletons’’ that were explicitly designed to maintain all the semantic 
relationships between the words, the organisation of hierarchies and tables still proved 
difficult for most subjects. Experiment 3 showed, however, that young children can 
retrieve data from existing structures. These results are considered in relation to the 
increased use of computer-based information retrieval systems in schools. 


INTRODUCTION 


THE increasing use of information handling packages in recent decades has 
stimulated a number of investigations into how people organise data (Durding et al., 
1977; Brosey and Shneiderman, 1978). The underlying aim here is to facilitate the 
use of computer information packages by matching software structures to the 
organisational structures most readily available to humans. If matches are not 
possible, the software designer and the potential users, at least, need to be made 
aware of potential problems in the use of any program. Although the applied aim of 
the research might appear new, the theoretical and methodological approaches are 
based in the study of cognition, and, in particular, research on the structure of 
memory (Underwood, 1976; Anderson, 1980; Mayer, 1983). The three experiments 
presented here were designed to investigate how the structure of a database 
influences data-handling by schoolchildren. 


An assumption of this research on cognitive ergonomics is that memory is 
organised in such a way that new information can be easily entered and can be, at a 
later date, efficiently retrieved from the memory store; and the aim is to discover the 
nature of the specific organisational schema on which these high-level cognitive 
activities are based. Durding ef al. (1977) have argued that there is substantial 
experimental support for a number of organisational models, including lists, 
hierarchies and networks, but that no one model fully explains how human memory 
is functionally organised, because the type of organisation used is task-dependent. 


A number of organisational structures appear to be readily available for an 
operator's use. We are not restricted to using one data organisation because we have 
the flexibility to adapt our cognitive processes to changing information structures. 


313 


314 Organisational Structures 


Durding et al. confirmed that adults (undergraduates), when faced with word sets 
exhibiting a range of pre-defined organisational structures, were capable of 
recognising and making explicit those structures. The task presented subjects with 
word sets, 15-20 words long, and the subjects were asked to organise the word sets in 
a way which maintained and made explicit the semantic relationships among the 
words. Each word set had a natural organisation such as a tree, network, list or 
table. There was, however, a ranking of ease of use in the order: lists, hierarchies, 
networks and tables. Tables were the only organisational type with which subjects 
achieved less than a 50 per cent success rate. This order was not maintained when 
subjects were primed as to the appropriate organisation to use, by the inclusion of a 
skeletal diagram accompanying each word set. Lists and hierarchies were still the 
easiest to organise, but they were now followed closely by tables. The provision of 
an overt structure did little to facilitate the recognition of networks. A small number 
of the subjects tested in each condition applied the same list structure to all data 
regardless of the pre-defined organisational structure of the data, but the overall 
findings were that adults were aware and capable of constructing a variety of 
organisational structures. 


Rather than asking their subjects to construct organisational structures, Brosey 
and Shneiderman (1978) asked them to access information from databases, with 
either a tree (hierarchical) structure or a tabular structure. The tree structure proved 
to be an easier retrieval format, and it was also easier to commit to memory and 
reproduce than the tabular structure. 


The two sets of experiments together suggest that after lists the most accessible 
form for data is in hierarchies or trees. However, other organisational structures 
may be used with some measure of success if the subject is made aware of the 
relevance of that structure to the task in hand. 


The conclusions to be drawn here are that the type of data structure available 
will influence the efficiency with which adults both input and output material from a 
database. As the use of educational information handling packages grows it 
becomes pertinent to question whether these findings are applicable to younger 
subjects and, if so, what are the implications for the use of information handling 
packages in schools? The use of these packages is to be welcomed, not only because 
they will familiarise children with the concepts underlying a new information- 
processing technology, but also because they facilitate cognitive development 
(Underwood and Underwood, 1987). The most common knowledge-based systems 
for UK schools use either a hierarchical (e.g., SEEK, TREE OF KNOWLEDGE) 
or tabular (e.g, FACTFILE, INFORM) structure. SEEK and TREE OF 
KNOWLEDGE both use a binary tree structure, in which information is sorted or 
classified linearly through a simple series of non-probabalistic yes/no answers. 
FACTFILE and INFORM both operate a matrix structure of fields and records. 


The use of both types of knowledge-based system has been shown to aid the 
cognitive development of 9-11 year-old children (Underwood, 1986). Both SEEK 
and FACTFILE aided the use of subsequent categorisation strategies, in comparison 
with their non-machine based equivalents. In a further study, Giboin and Michard 
(1984) found children of 11-13 years of age developed good programming habits, 
focusing on the semantic rather than the syntactic features of programming, when 
exposed to a hierarchically structured programming tutor. Although the use of the 
hierarchical structure proved difficult for these children, initial emphasis on a top- 
down approach resulted in learners being progressively able to impose a hierarchical 
structure on their programming. 


A network structure has not commonly been chosen as the underpinning 
organisational structure for educational databases. The results of Durding ef al. 
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(1977) and Brosey and Shneiderman (1978) would suggest that this is a wise 
exclusion. Tables, and particularly hierarchies, are more likely to achieve a 
congruency between the existing organisation in the user's mind for a particular task 
and the representation of information by the computer to the user, and will 
therefore facilitate the successful manipulation of the database. 


The following investigation was designed to test the significance of the research 
of Durding et al. (1977) and Brosey and Shneiderman (1978) to the use of 
knowledge-bases by children in the age-range 9 to 11 years. This age-range was 
selected for two reasons. Such children are a target population for those trying to 
encourage the use of information handling packages in schools, because the 
packages fit so well into the topic-work approach that forms a core element in UK 
classrooms at this age. Secondly, work such as that of Olver and Hornsby (1966) 
suggests that by 9 years of age, many children are capable of developing 
superordinate categories, a necessary precursor for defining complex organisational 
structures. 


The key questions in this research were: 


(1) Have young children a range of organisational structures readily available 
for storage and retrieval of information? 

(2) Do young children use different organisational structures from adults? 

(3) While children might have lower overall organisational proficiency than 
adults, is the pattern of that proficiency qualitatively the same; that is, do 
they produce the same rank ordering of ease of use for organisational 
structures as was found for adults? 


EXPERIMENT 1 


The first experiment was designed to determine the organisational structures 
available to young children using an experimental method similar to that of Durding 
et al. (1977). The objective of the experiment was to investigate the types of 
organisational structures young children would employ spontaneously when asked 
to sort word sets. The main question was whether the choice of data organisation 
employed would be determined by the pre-defined semantic relationships within 
each word set, or whether subjects would operate one predominant structure 
regardless of those pre-defined semantic relationships. 


In order to answer this question, subjects were asked to organise word sets 
which, unknown to them, had the following pre-defined structures; lists, 
hierarchies, networks and tables. There were a number of reasons for choosing these 
particular data structures. In the first place hierarchies and tables form the 
underlying structure of a number of common educational computer information 
handling programs, such as binary trees (e.g., SEEK) and databases (e.g., 
FACTFILE and INFORM). In more advanced or commercial databases, networks 
are also widely used as the underpinning structure (Shneiderman, 1980). Secondly, 
the types of stimuli presented matched those in Durding's experiments. Hence 
comparisons were possible between the organisational structures available to adults 
and to children. 


METHOD 
Sample 
38 school children (mean age- 10-1 years, SD — 0-7), all members of either а 
third or a fourth year class of a junior school, were tested for their reading ability 
and for their non-verbal spatial ability. The mean raw reading score on McLeod's 
(1970) GAP test was 21:5 (SD = 5-6), and the mean score on Raven’s Matrices 
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(1956) was 16:2 (SD = 4-1). These figures are provided to facilitate comparison 
across the three experiments. Mean scores for the samples used in the three 
experiments were similar. 


Materials 

The materials presented to each child consisted of five word sets of 10 or 12 
words each. The words within each set were selected to conform to one of the 
selected pre-defined organisational structures; lists, hierarchies, simple tree 
networks and tables, and in addition, a control group of random words, designed to 
have no apparent organisation. An example of each structure is found in Figure 1. 


Two separate, but similar sets, of stimuli material, were developed. There were, 
therefore, two word sets corresponding to each of the four chosen organisational 
structures, and there were two sets of twelve ‘‘random’’ words. This produced five 
types of word sets for which there were two instances of each. 


Each subject received a booklet containing five word sets, each printed in a 
single column on a separate sheet, and representing each of the five types of 


FIGURE 1 

EXAMPLES OF WORD SETS USED FOR EACH OF THE PRE-DEFINED ORGANISATIONS IN EXPERIMENTS 1 AND 2 

red ball hour games’ 

green teddy day outdoor indoor 

blue doll month gris boys cards board 
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experimental stimuli. The presentation order of the stimuli was randomised for each 
subject. 


Procedure 

Children were tested in their two class groups. The two groups of five word sets, 
which comprised the booklets, were given out alternately to the children in the class. 
Thus neighbouring children within a class worked on different word sets. 


The children were told that the words on each page were related in meaning and 
that their task was to discover the relationships among the words. Then they were to 
rewrite the words in such a way as to make the relationships they had discovered 
apparent to the experimenter. They were told that the words were not intended to 
make up a sentence and that the printed order of the words on the sheet was 
unimportant. A brief question session was allowed to make sure the children 
understood the task at hand. 


Emphasis was placed upon the lack of any time constraint. Time on task was 
not taken as a measure of performance because it would penalise those children with 
poorly developed writing skills. Children could have as many tries as they wanted 
before coming to their final decision but after they had turned over to the next 
example they were not to go back. They worked through the five examples at their 
own pace, which took from 30 to 45 minutes. 


RESULTS 


Two scoring techniques were developed. The first scoring technique was a 
simple measure of the type of organisation used by each subject and matched that 
used by Durding eż al. The second scoring technique measured the degree to which 
subjects recognised relationships among words but did not require the child to 
demonstrate an overt recognition of the pre-defined organisational structure. 


Simple scoring technique 

This initial method of scoring was a measure of frequency of use of 
organisational structure per se and took no account of the accuracy or semantic 
sufficiency of the children's work. The results of this analysis are presented in 
Table 1. 


Children's own organisations were classified into the four pre-defined 
organisational structures to give the contrasts shown in Table 1, but it was apparent 
that they frequently used organisational structures other than those defined by 
Durding ef al. Several actually identified the random stimuli as random, 
commenting that they could see no pattern in the meaning of the words. Others 
simply reorganised the word set into alphabetical order or added synonyms to each 
of the stimuli words. More importantly, however, a number organised the word sets 
into pairings. This organisational structure can be thought of as primitive lists. The 
children's structures were classified into seven categories: lists, hierarchies, 
networks, tables, declared-random, primitive lists and others. ‘‘Others’’ included 
the alphabetical and synonym structures. 


Table 1 shows the percentage frequency of each type of subject organisation for 
each type of pre-defined structure. On average, only 25 per cent of the word sets 
were organised in a way consistent with their pre-defined structures. Further, it can 
clearly be seen that the young children had a strong preference for list organisation; 
62 per cent of all word sets were organised into list structures. Of word sets with a 
pre-defined list structure, 82 per cent were organised into lists. Children had little 
success with the other three pre-defined structures. Hierarchies (8 per cent) and 
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TABLE I 


PERCENTAGE FREQUENCY OF EACH TYPE OF ORGANISATION FOR EACH TYPE ОЕ PRE- DEFINED 
ORGANISATION IN EXPERIMENT 1. (SIMPLE SCORING TECHNIQUE N — 38) 





Pre-defined Organisation 








Child's List Hierarchy Network Table Random Total 
Organisation Fo % % % % % 
Lists 82 66 61 76 26 62 
Hierarchy 0 8 0 0 0 2 
Network 0 3 21 3 5 6 
Table 0 0 0 0 0 0 
Declared as Random 0 0 0 0 16 3 
Primitive List 8 18 11 13 47 20 
Other 11 5 8 8 5 7 


Figures in bold indicate the % match between pre-organisational structure and children’s organisation of 
a word set. All % are rounded to the nearest whole number. 


tables (0 per cent), proved particularly difficult for them, and were rarely organised 
into their pre-defined structures. Indeed, no child offered a tabular organisation as a 
possible answer in any of the conditions. The children were more likely to recognise 
the lack of structure in the random word sets (16 per cent) than to recognise the 
hierarchical or tabular nature of the data. 


Complex scoring technique 

A second more detailed scoring technique was also used. Here, children's 
organisations were matched to the pre-defined structure of the experimenter. The 
recognition of the semantic relationships within the word sets was now important. 
Each child's score was incremented if they had maintained the relative position of 
the word within the experimenter's organisation, but an exact pattern match was not 
necessary. 


Each list word had only one relevant relationship — its placement into a group 
— but there was no ordering within the group. Each hierarchy and table word had 
two relevant relationships. For hierarchies any one word was related to words both 
above and below it in the hierarchy. For tables, relationships were defined within 
each row and column. A network word had one relevant relationship, its place 
within the group but not the order within the group, unless the word lay on a node. 
In that case the word would have two relationships. 


Thus for a list, a score of 12 points could be achieved by placing the 12 words 
into the three pre-defined groups. Although many children achieved a full score, 
others recognised only two groups or failed to find the fourth member of one or 
more groups. For networks, where 10 words completed a set, one point was awarded 
for correctly designating each member of the group, as with the lists, but words with 
dual meanings could acquire two points if appropriately placed, that is if the child 
recognised its membership of both groups. For the hierarchies, children were 
required to recognise the subordinate and superordinate member associated with 
any word, achieving half a point for each. For tables, recognition of the class type 
(table column) and semantic relative (table row) each received half a point. 
Altogether a score of 12 points could be gained for each of the pre-defined 
organisational structures. The control condition, random words, did not feature in 
this analysis as it was not possible to define a correct response. 
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A one-way, within-subjects analysis of variance showed that the children 
responded differently to the different implicit data organisations (F= 50:2; 
df=3,108; P « 0-0001); that is, they were not able to match the pre-defined 
structure equally successfully in all cases. 


TABLE 2 


MEAN PERCENTAGE SCORE FOR THE DEGREE TO WHICH ORGANISATIONS MATCHED THE EXPERIMENTER'S 
PRE- DEFINED STRUCTURE. (COMPLEX SCORING TECHNIQUE — MAXIMUM ACHIEVABLE SCORE 12. N=38) 


Mean Degree 
of Organisational Pre-defined Organisation 
Match 





List Network Hierarchy Table 





mean % score 74:3 51°4 29-9 27:8 


Performance contrasts in Experiment 1. Conditions joined by a common hne did not differ from each 
other, according to Tukey's HSD test (P < 0-01). 


Mean percentage scores representing the degree of organisational match 
achieved between the children and the experimenter, for each of the four pre-defined 
organisational structures, are shown in Table 2. The table also shows the results of 
the paired comparisons using Tukey's HSD test. Hierarchies and tables did not 
differ from each other, but they did differ from both lists and networks which, in 
turn, differed from each other. 


DISCUSSION 


This investigation sought to answer several questions. The first question was 
whether young children can organise information according to a pre-defined 
structure. The corollary of this question is whether young children use a single 
organisational structure despite the internal organisation inherent in that 
information. 


A comparison of these data with the results from Durding et al. (simple scoring 
technique only — Table 1) revealed not only a quantitative but also a qualitative 
difference in responses. These children were less successful at identifying the pre- 
defined structure (25 per cent success rate) than the adults (59 per cent) of Durding et 
al. Although lists were recognised almost as easily by the children (82 per cent) as by 
the adults (96 per cent), there was a very poor response to other types of 
organisation. Hierarchies caused particular difficulties for the children (8 per cent) 
as compared to the adults (79 per cent correct responses). Networks proved 
relatively less difficult for the children (21 per cent) than hierarchies, a reversal of 
the Durding et al. results, Tables were poorly handled by both groups, however. 


Consistent with Durding's findings, children largely reverted to list structures if 
they could not recognise the semantic structure in the data, or if they were having 
trouble with a more constraining structure, that is a structure which required them 
to recognise more than one semantic relationship. For example the semantic 
relationship between rows (semantic relative) in a table were often recognised while 
the relationship between the columns (class type) were not. Of the non-random word 
sets organised differently from their pre-defined structures, 69 per cent were 
classified as lists, 0 per cent as hierarchies, 2 per cent as networks, 0 per cent as 
tables, 19 per cent as primitive lists with 13 per cent of other types of erroneous 
organisations. 
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The failure to use a range of organisational structures may have been due to the 
lack of familiarity with the more constraining organisational structures, or to the 
limited cognitive ability of many children to work with more than one dimension at 
a time, as would be required with the tabular structures, in particular, or to the 
mode of presentation of the stimuli. In the latter case list structures might have been 
induced by the columnar presentation of the stimuli. This possibility exists both here 
and in the experiments of Durding et al. Evidence against this argument comes from 
a pilot study with a similar sample of children. Lists and primitive lists were the most 
frequently occurring organisational structure used by children when asked to 
organise a set of 24 picture cards randomly laid out on a table top. 


EXPERIMENT 2 


Experiment 1 found that children had difficulty in organising sets of words into 
tabular and hierarchical organisations. Hence, the next experiment was designed to 
determine whether these organisations could be constructed when the structure of 
each word set was made explicit. Experiment 2 required children to organise the 
same word sets as in Experiment 1, but this time the words were to be placed in 
skeletal structures which made apparent the pre-defined semantic structure of each 
word set. 


METHOD 
Sample 
52 school children (mean age = 10:0, SD = 0: 5), all members of either a third or 
a fourth year class of a junior school, were tested for their reading ability and for 
their non-verbal ability. The mean raw reading score on McLeod's (1970) GAP test 
was 20:5 (SD = 9-8), and the mean non-verbal ability score on Raven's Matrices 
(1956) was 16-8 (SD = 5-3). These children did not participate in Experiment 1. 


Materials 

The materials were the same word sets as used in Experiment 1 with exclusion of 
the control condition, random word sets, as it was not possible to draw an 
organisational structure for these arbitary sets of words. A skeletal diagram 
consisting of a system of boxes and non-directional links, into which the words were 
to be placed was presented below each word set. The diagrams appeared as in Figure 
1 but without the words in place. The diagrams were placed below the word sets as 
the task sequence would suggest. Ordering of word set presentation was randomised 
across children. 


Procedure 

Children were tested in their two class groups. A booklet containing four word 
sets, each as a column of words representing one of the four pre-defined 
organisational structures (lists, hierarchies, networks and tables), with the 
appropriate skeletal diagram highlighting the structural organisation of the word set 
below, was presented to each child. The two groups of four word sets, which 
comprised the booklets, were given out alternately to the children in the class. Thus 
neighbouring children within a class worked on different word sets. 


Instructions to the children were as in Experiment 1 with the additional 
description of the skeletal structure and the use that they were to make of it. They 
were told that each box represented the space to be occupied by one of the words on 
the page above the diagram. The lines joining one box to another showed that the 
two words were related to each other in some way. They were asked to sort the 
words into an order which highlighted the relationship between them, and they were 
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to put each word into the appropriate box. They worked through the four examples 
at their own pace, which took from 25 to 50 minutes. 


RESULTS 
Two scoring techniques were employed, as in Experiment 1. 


Simple scoring technique 

An initial simple analysis assessed the type of organisational structure chosen in 
each instance by the children, with no reference to the accuracy or semantic 
sufficiency of the subjects! work. The subjects! organisational structures were 
classified into six categories: lists, hierarchies, networks, tables, primitive lists and 
others. The seventh category, defined-random, from Experiment 1, was not 
appropriate here as each word set had a pre-defined structure. Again these 
categories described 100 per cent of the subjects’ organisations. The results of this 
analysis are presented in Table 3. 


TABLE 3 


PERCENTAGE FREQUENCY OF EACH CLASS OF ORGANISATION FOR EACH CLASS OF PRE-DEFINED 
ORGANISATION IN EXPERIMENT 1. (SIMPLE SCORING TECHNIQUE N — 52) 


Pre-defined Organisation 








Child's List Hierarchy Network Table Total 
Organisation % % o % o 
Lists 92 21 48 67 57 
Hierarchy 0 21 0 0 5 
Network 0 0 39 4 11 
Table 0 0 0 6 1 
Primitive List 4 52 12 21 22 
Other 4 4 0-5 4 3 





Figures ın bold indicate the % match between pre-organisational structure and children's organisation of 
а word set. % are rounded to the nearest whole number. 


Table 3 shows the frequency of each type of organisation for each type of pre- 
defined structure. Only 39 per cent of the word sets were organised in a way 
consistent with the experimenter's pre-defined structures. There is again a strong 
preference for list structures: 57 per cent of all word sets were organised in this way. 
Children organised 92 per cent of the list word sets into list structures, but were less 
successful in using the other three pre-defined organisations. Only 21 per cent of 
hierarchies, .39 per cent of networks, and 6 per cent of tables were detected. 


Complex scoring technique 

The second scoring technique, which assessed the degree of match between 
responses and the experimenter's pre-defined structures, was applied here as in 
Experiment 1. An analysis of variance confirmed the findings of the simple analysis, 
in that children responded with varying success to the four types of data 
organisation (F = 116-5; 4#= 3,150; P < 0-0001). 


Mean scores for the degree to which their own organisations matched the pre- 
defined structures are shown in Table 4. Multiple paired comparisons (Tukey's HSD 
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TABLE 4 


MEAN PERCENTAGE SCORE FOR THE DEGREE TO WHICH ORGANISATION MATCHED THE EXPERIMENTER'S PRE- 
DEFINED STRUCTURE (COMPLEX SCORING TECHNIQUE — MAXIMUM ACHIEVABLE SCORE 12. N52) 











Mean Degree 
of Organisational Pre-defined Organisation 
Match 
List Network Hierarchy Table 
mean % score 90-3 65-6 32:9 36:1 








Performance contrasts in Experiment 2. Conditions Joined by а соттоп line did not differ from each 
other, according to Tukey’s HSD test (P < 0:01). 


test) are also indicated in the table. Hierarchies and tables did not differ from each 
other, but they did differ from both lists and networks which, in turn, differed from 
each other. 


DISCUSSION 


In Experiment 1 the two scoring techniques gave very similar results but in 
Experiment 2 there were contradictions between the analyses. The simple analysis, 
which asked whether the children recognised the inherent structure in the word set, 
concluded that hierarchies were more easily recognised than tables (Table 3). But in 
the more detailed analysis, where the child's ability to match his or her solution to 
the experimenter's was taken as the criterion of success, hierarchies fared no better 
than tables, as is shown in Table 4. This apparent contradiction can be seen in terms 
of the relationship between lists and tables. Tables are lists on two dimensions; the 
category relationship was represented in the columns, and the semantic association 
occurred across the rows. Many children were able to work along one or other of 
these dimensions but were unable to work in the second dimension. Alternatively, 
they may have felt that they had successfully completed the task once a list was 
produced. The production of a list organisation scored zero under the first scoring 
technique, but a score of 3 (25 per cent match) on the second scoring technique. This 
cumulative addition of small scores increased the overall mean success rate of 
responses to tabular structures. 


This incrementation did not occur for hierarchies, where the failure to recognise 
category labels had an equally damaging effect for either scoring system, and the 
default organisation ‘‘lists’’ did not provide a useful partial solution to the problem 
of organising these word sets. This point appeared to be apparent to the children 
themselves. They reverted to list structures in 21 per cent (Table 3) of cases for 
hierarchical word sets as compared to 67 per cent of cases for tabular structures. 
Over half (52 per cent) who failed to recognise the pre-defined hierarchical nature of 
a word set organised that word set into a primitive list of pairing. 


The second scoring technique hinted that children were having less trouble with 
tables than at first thought, even though it appeared to be a more stringent criterion 
for scoring because it demanded that they reconstruct the same semantic 
relationships as the experimenter. The results from the second scoring technique are 
reasonable given that the children meet tables on a day-to-day basis in school, and 
some level of competence should be anticipated. Thus it can be argued that priming 
them as to the nature of the word set organisation does make a wider variety of 
organisational structures available to children, and the subsequent improved 
performances are statistically reliable. 
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A comparison of these data with the results of Durding et al. (1977, simple 
scoring technique only — Table 3) again showed that the children performed less 
well overall than adults. One point of discrepancy remained the comparative ease 
with which they handled networks (39 per cent) as opposed to hierarchies (21 per 
cent) compared to adults (69 per cent and 94 per cent respectively). In the second 
experiment by Durding ef al. a further qualitative difference was apparent. Adults 
had minimal difficulty in using the tabular structure provided (87 per cent partially 
correct responses) and showed the largest improvement in correct responses over 
Experiment 1 on this type of organisation. However, children were still having 
considerable difficulty with tables (6 per cent correct responses) according to the 
simple scoring technique. As argued earlier, the difficulties with tabular structures 
were less apparent when the more complex matching technique was employed to 
score the responses. 


Despite the problems which might ensue from the misplacement of a word with 
multiple meanings in a network or a superordinate category label in a hierarchy, the 
constraining nature of the skeletal diagram was an important factor in the successful 
completion of any one type of organisation. Indeed our evidence suggests that in all 
types of organisation, except lists, the skeletal diagram made children pause for 
thought. Their deliberations resulted in one of three main strategies or approaches 
to the problem at hand. The first approach was to order the words without recourse 
to the diagram, most often into lists or primitive lists. These lists were then placed 
word by word into the skeletal structure, but because the skeleton was not congruent 
with the organisational structure they had elected to use, there was no obvious 
starting point for these entries. An arbitrary entry point was therefore selected, and 
words were entered along a linear pathway, ignoring the tie lines which indicated 
relationships, until all the boxes were full. Comments on the sheets by a number of 
children. indicated that they were not satisfied with their solution to the problem, 
that is they knew they were in error, but could see no other solution. 


A second approach was used when they were cognisant of the skeletal structure 
but, in completing the diagram, one of the keywords was misplaced (multiple 
meaning words in networks; category labels in hierarchies), possibly because of a 
lack of pre-planning or entering material into the skeleton before all the 
relationships had been established. In that situation, they were inclined to alter the 
skeletal diagram, breaking links where they felt no relationship existed and forging 
new links where appropriate. 


The final approach, the most appropriate strategy, occurred when children 
were cognisant of the skeletal structure and actively reworked their organisational 
structure to fit the skeleton. They often combined these last two approaches and 
both reworked the word sets and redrew the structure. The results of Experiment 
2 suggest that young children are capable of constructing a range of organisational 
structures when the nature of the structure required is made apparent to them. 


EXPERIMENT 3 


In Experiment 3 the investigation questioned the ability of young children to 
access information from a variety of completed organisational forms. 


METHOD 
Sample 
50 children aged between 9 and 11 years of age (mean age=10-1 years, 
SD — 0-6), all members of either a third or a fourth year junior school class, were 
tested for their reading ability and for their non-verbal ability. The mean raw 
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reading score on McLeod’s (1970) GAP test was 20:4 (SD = 10-0), and the mean 
non-verbal ability score on Raven's Matrices (1956) was 16:2 (SD = 5:0). No child 
in this experiment had taken part in either of the previous experiments. 


Materials 

The materials consisted of three word sets of 10 or 12 words each, and were the 
same word sets as used in the previously described experiments, with the exclusion of 
the words organised into list structures and the random words, which acted as the 
control in Experiment 1. Two separate, but similar sets, of stimuli material, were 
developed. There were, therefore, two word sets corresponding to each of the three 
chosen organisational structures. This produced three types of word sets for which 
there were two instances of each. 


Each word set was placed in its appropriate skeletal structure and appeared at 
the top of the page (as in Figure 1). This was the reverse presentation layout to that 
of Experiment 2, but it was consistent with task sequence order. A number of 
questions pertinent to the relationships in the word set, and with appropriate spaces 
for answers left blank, appeared underneath the organisational diagram. **Number 
of” and ‘‘type of” questions were used with the same frequency, as far as possible, 
across the pre-defined organisational structures. For example, a simple sub-category 
question appeared on all sheets: 


Hierarchy: How many games are shown on this sheet? 
Network: How many actions are shown on this sheet? 
Table: How many people are shown on this sheet? 


A more complex question might ask: 


Hierarchy: What have cricket and netball got in common? 
Network: What have Pat and June got in common? 
Table: What have road and court got in common? 


In all, six questions, resulting in seven answers, as one question required two 
answers, were used for each type of data organisation. Each child received a booklet 
containing word sets and skeletal diagrams representing each of the three types of 
experimental stimuli. The presentation order of the stimuli was randomised for each 
child 


Procedure 

Children were tested in their two class groups. A booklet containing three word 
'sets was presented to each child. Each booklet placed an appropriate word set into 
one of the three pre-defined organisational structures, hierarchies, networks or 
tables. Each organisational structure was accompanied by the relevant target 
questions. The two groups of three word sets, which comprised the booklets, were 
given out alternately to the children in the class. Thus neighbouring children within a 
class worked on different word sets. 


The children were told that the words in the boxes"were related to each other. 
The relationship between the words was indicated by the lines linking any two boxes 
together. The tie-lines between boxes were bi-directional. They were then asked to 
complete the questions below the diagram. It was emphasised that the answer to 
each question could be found by looking at the diagram and checking the 
relationships between the words found there. 


They were told to work at their own pace. It was emphasised that they could 
have as many tries as they wanted before coming to a final decision but after they 
had turned over to the next example they were not to go back. They worked through 
the three examples at their own pace, which took from 10 to 20 minutes. 
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RESULTS 


Children were accorded one mark for each correct response and could, 
therefore, achieve a minimum score of zero and a maximum score of seven for each 
word set. They had moderate success in extracting meaning from each of the three 
pre-defined organisational structures. Scores for responses to hierarchies (47 per 
cent) and tables (49 per cent) were little different from each other, but networks gave 
slightly higher success rate (55 per cent). 


An analysis of variance indicated a main effect of organisational type (F — 3:15; 
df=2, 96; P < 0-05). Multiple paired comparisons (Tukey's HSD test) between 
mean scores for each of the data structures failed to reach significance at the 
P « 0.01 level advocated for paired comparisons by Meyers and Grossen (1974), 
although a trend was discernible. There was a tendency for performance with 
networks to be better than with hierarchies (P « 0-05). Performances with tables 
were not statistically distinguishable from those of either networks or from 
hierarchies. 


DISCUSSION 


Tentative comparisons can be made between these data and those of Brosey and 
Shneiderman (1978), who tested the relative ease of data extraction from tabular and 
hierarchical data structures for university undergraduates. They used two tasks: 
execution of queries from the database, and the commitment to memory and recall 
of the complete data structure after the query task. Only the first task, execution of 
queries, has relevance to the present study. On this task Brosey's adults found it 
easier to extract data from the hierarchical structure rather than from the table. This 
finding was confirmed for data extraction from a larger database organised into 
each of the two structures, table and hierarchy. 


These findings are less directly comparable to our Experiment 3 than those of 
Durding et al. (1977) were to Experiments 1 and 2 because the reduction in task 
difficulty for the children in Experiment 3 was so much greater in comparison to the 
adult study than it was for the initial experiments. Children again performed less 
well than adults, but there was a qualitative difference in responses between adults 
and children, the latter finding tables (49 per cent correct) and hierarchies (47 per 
cent) equally difficult, while the adults were more adept at using hierarchies (tables 
70 per cent, hierarchies 78 per cent — derived figures). 


The discussion will focus upon trends apparent across the three experiments, 
and upon the relationships between individual differences and task performance. 


COMPARISON OF RESULTS ACROSS THE THREE EXPERIMENTS 


A comparison of the data from the three experiments showed an overall 
improvement in performance for children as the task changed from ordering data 
with no guidance to ordering the word sets with an explicit statement of the structure 
to be used and, finally, to extracting meaning from the pre-ordered word set. An 
analysis of variance of the percentage correct scores for the three organisational 
structures (hierarchies, networks and tables) common to all three experiments, 
indicated main effects of experimental condition (F— 7:86; df=2,137; P < 0-001) 
and of organisational structure (F = 62:72; df 22,274; P < 0-0001). There was also 
a strong interaction between experimental condition and organisational structure 
(Е=7:56; df=4,274; Р < 0-0001). The pattern of performance was complex, 
however. Multiple paired comparisons (Tukey's HSD test) of the common 
structures, across the three experiments, confirmed that networks were handled 
more easily than hierarchies (P « 0-01), and than tables in the first two experiments 
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(P«0-01) In Experiment 3, although networks proved easier to extract 
information from than hierarchies (P « 0-01), there was no reliable difference in 
performance in the extraction of data from networks and tables. 


The results suggested an improvement of performance across all types of 
structure for Experiment 2, as compared to the performances in Experiment 1, and, 
indeed, the provision of a skeletal structure in Experiment 2 proved to be an aid to 
children's recognition and use of the network organisation (P « 0-01). The multiple 
paired comparisons showed that the improved performances in Experiment 2 for 
hierarchies and tables (P > 0-05) were not reliably different from those recorded in 
Experiment 1. The pattern of overall improvement was not maintained in 
Experiment 3. Children were more successful at using existing hierarchies and 
tables, rather than building them, but it appears that it is easier to construct a 
network when primed (Experiment 2) than to either produce an unprimed network 
(Experiment 1) or to extract information from it (Experiment 3). 


Ability levels 

Table 5 presents correlations between task performance (complex scoring 
technique) and ability levels. The tasks in all three experiments were inherently 
classificatory; such tasks could legitimately be expected to draw on non-verbal 
abilities (Bruner et al., 1956), and, following the work of Turner et al. (1984), also 
be related to reading ability. In Experiment 1, the most striking finding of the 
analysis was the failure of reading and non-verbal abilities to predict the 
performance scores recorded for each organisational type. There are two possible 
contributory factors to this result: insufficient range of ability in the sample, and the 
existence of both ceiling and floor effects as a result of the high performance when 
working with lists compared to a very poor performance by the majority of children 
when working with other structures. Although one might have expected the more 
gifted children to use a greater variety of organisational structures than their weaker 
peers this was not the case; list structures were the predominant organisation used. 
However, those who performed well on constructing list structures also performed 
well in constructing each of the other three organisational structures (P < 0-05). 


In Experiment 3 success at responding to questions on all three data structures 
was related to non-verbal ability (Table 5), while responding to questions on 


TABLE 5 


CORRELATIONS BETWEEN READING AND NON-VERBAL ABILITIES AND TASK PERFORMANCE (COMPLEX 
SCORING TECHNIQUE) FOR THE THREE PRE-DEFINED ORGANISATIONAL STRUCTURES COMMON ACROSS THE 
EXPERIMENTS 


Pre-defined Organisation 





Ability 
Hierarchy Network Table 
Experiments . 

I 0-294 0:149 0-198 
Reading 2 0-515*** 0-538*** 0-237 
Age 3 0-269 0-528*** 0-438** 
Non-verbal ] 0-058 —0-172 0-004 
Ability 2 0-476*** 0-634*** 0:249 

3 0-350* 0:448** 0-525*** 


Level of significance of correlation: * P < 0-05; ** P < 0-01; *** P < 0-001. 
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networks and tables was correlated with reading age although this relationship did 
not hold for hierarchical structures. Successful completion of the task on any one of 
the structures was related to success on each of the other structures; thus success 
with hierarchies related to success with networks (r= +0-49; df=49; P < 0-001) 
and with tables (r= +0°43; df=49; P < 0-01); and this relationship held for 
networks and tables (r= + 0:43; df —49; P < 0-01). 


Comparisons across the three experiments produced no clear picture of the 
influence of ability on the rate of success children might achieve in either setting up 
or extracting information from the different data structures (Table 5). This result 
may well have been due to the low success rates achieved when working with 
hierarchies and tables. The construction and extraction of information from 
networks was strongly related to ability levels. Success in constructing hierarchies 
was also related to both non-verbal and reading ability but extraction of meaning 
from a hierarchy showed reduced dependence on non-verbal ability, and no 
influence of reading ability at all. Tabular structures showed the reverse pattern to 
hierarchies. In Experiment 3, ability components appeared highly significant but 
these relationships were not present in Experiment 2. 


GENERAL DISCUSSION 


The three experiments reported here were designed to explore the ease with 
which young children (9-11 years of age) could construct and extract data from a 
variety of organisational structures. This work was stimulated by the rapidly 
increasing use of knowledge-based computer packages in schools, and extends the 
conclusions reached by Durding et al. (1977) and, to a less extent by Brosey and 
Shneiderman (1978). 


Experiments 1 and 2 encourage us to approach the use of knowledge handling 
packages with caution. Unlike Durding's undergraduates, these young children were 
not very successful in discovering the inherent organisation of information, and 
showed a marked tendency to impose a standard structure (a list) on all data 
regardless of its inherent organisation. This result held true whether they were 
primed to the appropriate organisational structure or not; although priming did 
improve performance on the non-list organisations. Hierarchies and tables were 
perceived as equally difficulty by children, while networks were somewhat easier to 
construct. This was a very different result from Durding et al. whose results showed 
that undergraduates were well able to reconstruct hierarchies, networks and, 
particularly lists. Tables were more difficult to reproduce, however. The dominance 
of the list structure for the majority of children, and the poor performance on all 
other structures in Experiment 1 masked individual differences and the influence of 
ability measures. Non-verbal and reading abilities were controlling variables for all 
data structures, except tables, in Experiment 2. 


On a more hopeful note for those working with young children, Experiment 3 
shows that the children were able to extract data from hierarchies, networks and 
tables with about 50 per cent success. We might, therefore, expect junior school 
children to use, if not construct, each of these organisational structures with similar 
facility. A 50 per cent success rate is still quite low, although it seems that in working 
with networks and tables at least, low scores might be partially attributed to the 
children's view that one answer is sufficient for any question! Again, comparing 
these results to adult studies (Brosey and Shneiderman, 1978), the most striking 
finding was the equivalence of performance on hierarchies and tables by the young 
subjects (tables being the easier of the two), while adults found it far easier to work 
with hierarchies than tables. Non-verbal ability proved to be a key predictor of data 
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extraction, although reading ability was also highly correlated with the successful 
use of networks and tables. 


What do these findings mean for the use of information handling packages in 
school? The difficulties experienced by children, compared to those of adults in 
similar circumstances, cannot necessarily be explained in terms of the development 
of individual abilities. It could be argued that adult performances are a result of 
greater exposure, and therefore familiarity with, the types of structures used in these 
experiments. Equally. the low levels of success may have resulted from the fact that 
the children could not see the purpose of a task which merely asked to organise the 
words. If the task was placed in context then they might have been more successful. 
In either case, if information handling packages are to be used in school, then 
teachers will need to ensure that children have had exposure to the relevant 
structures and have come to value the different methods of organising data, in order 
to achieve successful outcomes. 


Research by Bransford (1979) adds another caveat to the application of these 
results. In a discussion of concept formation, he points out that in the majority of 
concept-identification experiments participants had to identify which member of a 
set of known concepts the experimenter had in mind (e.g., red triangles, cf. Bruner 
et al., 1956). Once the correct concept had been identified, or the experimenter made 
the rule explicit to the subject, problem solving became relatively easy (Anderson 
and Kulhavy, 1972). Bransford argues that this is a very different situation from that 
normally existing in the classroom where the child is generally provided with the 
definition of new concepts rather than being allowed to discover them, and thus he 
fails to understand adequately or to transfer knowledge. Essentially, this suggests 
that data structures must be made transparent to children if they are to successfully 
organise or extract information from a particular structure. 
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RECENT COGNITIVE THEORIES APPLIED TO SEQUENTIAL 
LENGTH MEASURING KNOWLEDGE IN YOUNG CHILDREN 


Bv GILLIAN M. BOULTON-LEWIS 
(Brisbane College of Advanced Education, Australia) 


Summary. This research was designed to determine sequential length measuring 
knowledge in children aged 3-7 years. Sequences were predicted in advance logically from 
measurement theory, from a review of the literature, and from the information 
processing demand of the tasks. A sample of 80 children from mixed socio-economic 
backgrounds was tested on measures of capacity to process information and 15 main 
measurement tasks. Analysis of the data showed that the empirical sequence of length 
measuring knowledge was most like that predicted from analyses of the information 
processing demand of the tasks. It is asserted that mathematics curriculum content could 
be sequenced on the basis of similar information processing analysis. 


INTRODUCTION 


IN the 1980s much of the research in mathematics education has focused on 
determining students! primitive conceptualisations of mathematical ideas and 
processes; on the relationship between the students’ conceptual structures of these 
ideas and formal mathematical structures; on the modification of students? concepts 
into mature understanding, and on identifying the factors that influence the 
modifications. Lesh and Landau (1983) summarised the recent research in the 
development of mathematics ideas and contrasted it with child development 
research. They maintained that child development research tends to focus on 
cognitive capabilities that are invariant across bodies of subject matter; general 
changes in cognitive capabilities before and after major cognitive organisations (in 
particular at 2 years, 6-7 years and adolescence); and ideas that most students 
acquire ‘‘naturally’’ without instruction. They maintained that cognitive research 
has produced generalisations that researchers in mathematics ideas have considered 
to be too crude. They asserted that mathematics educators are now focusing directly 
on children's mathematical ideas because of their interest in substantive 
mathematics content and educational implications. Lesh and Landau also suggested 
that behavioural research based on task analyses (e.g., Klahr and Siegler, 1978; 
Siegler, 1981) ‘‘generate|s] models of isolated-task behaviour that ignore[s] certain 
mathematical ideas that a mathematics educator would regard as fundamental to the 
task.” 


Part of the disillusionment with cognitive research has probably been caused by 
the fact that the results of Piaget’s work and the similar research that it inspired have 
not been easy to apply to mathematics education. In the 1960s and 1970s determined 
efforts were made to apply Piaget’s cognitive structural framework to designing 
mathematics curricula. The effects still persist in many sfate curricula in Australia 
and have caused a certain laissez-faire approach to teaching mathematics to young 
children in particular. Biggs and Collis (1982) noted that it led to beliefs, for 
example, that teachers of young children should wait, and only provide informal 
pre-number games and activities, until a **point of readiness" was reached. The 
point of readiness was determined by the child passing conventional conservation 
tasks of the type devised by Piaget. With the benefit of hindsight we can see that 
Piaget's theory could not be directly applied to teaching mathematics. He was 
concerned with the logic of thinking and with providing a formal logical description 
of human knowledge as it developed over time. This concern led Piaget and his 
colleagues to devise tasks to assess the presence or absence of certain 
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logical/mathematical strucures in children's thinking. It is obvious that such tests 
would produce the kind of data that Piaget obtained. They showed the approximate 
ages at which children possess the logical structure to succeed on a particular task. 
They present a picture of deficits in the thinking of pre-operational children. They 
do not show the preliminary skilis and knowledge in each of the areas of 
mathematics that children possess before they succeed on a particular task. Siegler 
(1981) suggested that mathematics educators should look more closely at the 
knowledge that young children possess without feeling obliged to fit it to a 
preconceived Piagetian framework. The research in mathematics education appears 
now to be taking up such a challenge. 


Recent cognitive research however is not such a *'crude"' tool for mathematics 
educators as Lesh and Landau suggested. In particular the work of Halford, Case 
(and Pascual-Leone and Fischer) has much to offer to mathematics education. Each 
of the theories is concerned with the notion of an increasing upper limit to 
children's capacity to process information. Case (1985) and Halford (1982) have 
employed a variety of tests to assess that capacity at approximate ages. Halford 
(1982) has described sequential levels of children's ability to relate symbols to 
environmental elements and has identified classes of concepts belonging to each 
level (Halford, in press). These include mathematics concepts and provide a basis 
for determining the demand on processing capacity of content, and hence sequence, 
in mathematics curricula. Pascual-Leone (1970) proposed an M-space construct to 
describe increases in children's central computing space with age. M was a measure 
of the maximum number of chunks of information that a subject could control and 
integrate in a single decision. The problem of measuring and quantifying children's 
information processing space has since been the subject of major research by Case 
and associates. Case (1985) has described tasks that children should be able to 
perform with increases in age and processing space. Fischer (1980) has posited a 
hierarchy of skills that humans control and construct. These skills increase in 
structural and representational complexity with age and experience. Fischer's 
descriptions of the skills at each level can also be used as a framework to describe 
mathematical tasks. Each of the theories of Halford, Case and Fischer to a greater 
or lesser degree posit that unevenness of development across tasks and across bodies 
of knowledge will be the rule not the exception. Children apply capacity to process 
information differentially to specific salient tasks in their environment. 


The research described in this paper was intended to determine children's 
sequential knowledge of length measuring. It was designed to predict the sequence 
of development of knowledge of a particular mathematical idea from recent 
cognitive theories, and to compare the predictions with empirical research. 


Three different sequences of length measuring knowledge were posited before 
children were tested. The first sequence was a logico/mathematical one based on 
measurement theory. The component skills (variables — V) required for length 
measuring were determined and ordered according to whether they were logical 
prerequisites for subsequent skills. Research findings for each aspect of length 
measuring, identified in the logico/mathematical analysis, were reviewed. The 
second sequence of skills was based on that review. The skills were ordered 
according to the approximate ages at which 50 per cent or more of children 
apparently demonstrated mastery. The third sequence was based on analysing the 
demand that each of the skills, identified in the first sequence, would make on 
children's capacity to process information. The skills were then sequenced a third 
time according to their hypothesised demand (cf. Case and Halford in particular), 
the norms for increase in capacity to process information, and classes of tasks 
possible at each level (and approximate age). The three sequences are shown in 
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series. The learned use of such a procedure may or may not rest on full knowledge of 
the ratio scale embodied by a standard length measuring device. 


The logico/mathematical sequence in Table 1 was proposed on the basis of the 
extent to which the tasks required knowledge of the axioms of quantity, 
classifications of scales (i.e., fundamental vs. derived and according to numerical 
quantities) and direct and indirect procedures for measuring. Transitive reasoning 
(V 24) precedes conservation of invariance of length (V 22) in the sequence because 
the logical definition for equality rests on transitive reasoning, (cf. Nagel, 1931). 
The ability to derive a length measuring strategy (V 35), using discrete arbitrary 
units of measure in an additive manner, precedes knowledge of a measuring 
device as a representation of a ratio (V 30), and the informed use of such a scale. 
Subitizing (V 8) was placed first in the number sequence because it was posited that 
the ability to identify the numerosity of a small group probably rests entirely on 
perception. (Gelman and Gallistel, 1978, disputed that view). Knowledge of one-to- 
one correspondence (V 14) logically precedes the ability to count (V 34), because 
counting is a specific instance of one-to-one correspondence between the members 
of a set and the set of a number names in order. Knowledge of ordinal aspects and 
number (V 15, relative magnitudes) logically depends on and follows the ability to 
quantify sets (^ 5) by counting (V 34). 


Literature sequence 

The second sequence was based on the literature review described in Boulton- 
Lewis (1983). A search of literature since that date has revealed little change in the 
approximate ages at which skills are learned. Significant sources for approximate 
ages for the sequence of variables were as follows: 


V 8, Subitizing (and/or quantification of small numbers) has been found to 
occur at approximately 3 years (Piaget, 1969; Schaeffer er al., 1974; Gelman 
and Gallistel, 1978). 

V 16, Recognition of equality/inequality relations of length for pairs of 
objects begins to occur at about 3 years of age (Piaget et al., 1960; Smedslund, 
1963; Bryant and Trabasso, 1971; Terman and Merrill, 1937). 

V 18, Construction of a horizontal straight line should be possible between 
3 and 4 years (Piaget and Inhelder, 1967; Laurendeau and Pinard, 1970; 
Halford and MacDonald, 1977). 

V 17a, Ordering of lengths pair by pair occurs from about 4 or 5 years 
onwards (Inhelder and Piaget, 1964; Achenbach and Weisz, 1975). 

V 17b, Seriation of length on a logical basis seems to be possible from 
5 years onwards (Inhelder and Piaget, 1964; Achenbach and Weisz, 1975). 

V 21, Recognition of length invariance (without explanation). Children 
should be able to succeed on this task from about 5 years of age when they can 
make pair by pair comparisons. 

V 29, Measure with a ruler (without explanation). Children should be able 
to line up an object with a measuring device at about 5 years and make a 
comparison as for V 21. 

V 34, Counting to 10 with a rule (Sum of V 9, V 11, V 13) should become 
possible from about 4 or 5 years onwards (e.g. Wang et al., 1971; Schaeffer et 
al., 1974; Gelman and Gallistel, 1978). 

V 23, Correct response to transitivity task. This has been posited to occur 
by about 4 or 5 years. Such responses do not necessarily require transitive 
oo (Braine 1959, 1964; Bryant and Trabasso, 1971; Harris and Bassett, 

5). 
V 19, Construction of a diagonal line is usually possible by about 5 or 6 
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years (Piaget and Inhelder, 1967; Laurendeau and Pinard, 1970; Halford and 
MacDonald, 1977). 

V 15, Knowledge of relative magnitudes of small numbers (4 > 3, 7 > 6) 
should occur by 5 or 6 years. With numbers beyond 2 or 3, it probably depends 
on counting ability. 

V 22, Conservation of length invariance should be present from 6 to 7 years 
(Piaget et al., 1960) depending on the difficulty of the task. 

V 14, One-to-one correspondence to determine equivalence and cardinal 
number should develop at about 6 to 7 years according to Piaget (1969). 

V 24, Transitive reasoning has been posited to occur any time from 4 to 7 
years (Braine, 1959, 1964; Piaget et al., 1960; Smedslund, 1963a, 1963b, 1966; 
Carey and Steffe, 1968; Bryant and Trabasso, 1971; Harris and Bassett, 1975; 
Trabasso et al., 1975; Halford and Galloway, 1977; Breslow, 1981; Halford, 
1982). 

V 35, Length measuring strategy (sum of V 25, V 26, V 27, V 28). Children 
do not usually derive algorithms on their own initiative to measure non-aligned 
lengths by subdivision or iteration of a unit until about 8 or 9 years of age 
(Piaget et al., 1960; Bailey, 1973; Carpenter and Lewis, 1976). 

V 30, Ability to measure with a standard device such as a ruler should occur 
at about the same age as the preceding variable. 


The ages in the literature at which children succeeded on the tasks summarised 
above varied quite widely. The sequence of length measuring knowledge based on a 
review of the literature was of necessity therefore only approximate. 


Information-processing sequence 

The third sequence of skills was based on analysis of the demands of each task 
matched with norms for increasing capacity to process information with age (cf. 
Case and Halford). 


Halford (Halford and Wilson, 1980; Halford, 1982) has proposed equivalence 
classes of tasks at each of four levels (1C, IR, 2 and 3). The tasks are grouped 
according to the information they require for solution. Halford determined the 
minimum age of mastery of a class of tasks empirically by testing and training, by 
category analysis, by norms for short-term memory and by analysing the underlying 
mathematical structure. The tasks listed in Table 1 were assigned to levels according 
to analysis of the mathematical structure of each (viz. single category concepts — 
single relations between concepts — bivariate functions or relations — composition 
of binary operations) and the requirement they would make on information 
processing capacity. 


Case (1972, 1974, 1978, 1985) and Pascual-Leone (1970) have developed an 
M-space construct which is a measure of the child's working memory or central 
computing space. Measures of M include an executive scheme (e or a) plus the 
maximum number of discrete chunks of information (figurative and operative 
schemes) that M can control and process in a single decision. Case and Pascual- 
Leone have developed a range of tests of M-space. They have shown that 
performance on these tests increases with age and that they predict performance on a 
variety of cognitive tasks. The tasks listed in Table 1 were analysed according to the 
number of chunks (or schemes) which would need to be held and processed in M or 
working memory. Each task was assigned an M space value of e +n. In some cases it 
was difficult to assess the number of chunks required in M space by a particular task 
because it was difficult to ascertain the demand of e. In Halford's theory, on the 
other hand, the level of a task indicates the minimum age at which a strategy for 
solution can be acquired on the basis of the minimum information that is logically 
required in a single decision. 
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Most of the skills required for length measuring were categorised by level 
(Halford) and M demand (Case). Details of the analysis of each task according to 
the theories of Case and Halford are presented in Boulton-Lewis (1983). It would 
have been possible to analyse further the tasks on the basis of Fischer’s (1980) theory 
of control and construction of a hierarchy of skills. Unfortunately this information 
was not available until the research described here was well under way. 


METHOD 


Sample 

This consisted of 80 children from kindergartens and primary schools in the 
southern suburbs of Adelaide, South Australia. There were eight boys and eight girls 
at five age levels (mean ages at each level were 3:7, 4:6, 5:6, 6:5 and 7:5). The sample 
included equal numbers of children from low, middle and high socio-economic 
backgrounds. 


Procedure 

Each child was withdrawn and tested individually in two separate sessions. The 
tests included an M-space measure, two measures of short-term memory, and tests 
for variables 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 (a and b), 18, 19, 20, 21, 22, 23, 24, 
29, 30, 34 (sum of 9, 10, 11 and 13) and 35 (sum of 25, 26, 27, 28). The tasks are 
described briefly here and in detail in Boulton-Lewis (1983). 


M-space was measured with the Card Counting Test used by Case (1977). It was 
considered to be the most suitable of the tests of M-space for children aged 3-7 
years. Children were presented with five sets of cards at five levels. The sets at level 1 
contained one card each and increased to five cards per set at level 5. Each card 
contained from 2-9 coloured count symbols and from 2-9 distractor symbols. The 
child was required to recall the number of coloured count squares on all the cards in 
a set in the order of presentation. The child's M-space was assessed at the highest 
level at which s/he recalled at least 4/5 sets correctly. Other minor scoring details 
were as described by Case (1977). 


Measures of short-term memory: digit span. The repeating of digits test from 
the Stanford-Binet Intelligence Scale (adapted from Terman and Merrill, 1964) was 
used to measure digit span to determine short-term memory. Children were 
presented with three sets of 2 to 6 digit series and asked to repeat each series in the 
order of presentation. A score of two out of three correct digit series was necessary 
to indicate a short-term memory span of the series size. Word span. A repeating of 
words test was devised for this study as an additional measure of short-term memory 
in young children. Nine one syllable nouns were chosen from a list of the single 
words most frequently used by Australian children (Hart, ef a/., 1977). Pictures were 
used in advance of testing to ensure that children knew all the words. The words 
were substituted for digits and administered and scored as for the test above. 


V 8 Subitizing. The 'ability to quantify the numerosity of small sets without 
overt counting was measured by showing children 20 cards in random order 
depicting four sets for each of the numbers 1-5. Ten cards were photographs of well- 
known objects in random array and 10 showed stars arranged in straight lines and 
random arrays for each of the numbers. Children were asked to tell the number as 
quickly as possible, without counting. To be assessed as subitizing children were 
required to respond correctly to all four sets for a number. 


V 11 and V 12. Number names in sequence from 5 to 10, and 10 to 100. To 
measure this knowledge children were asked to clap and count. They were 
encouraged to count to 100 if possible. 
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V 14 One-to-one correspondence to determine equivalence and cardinal 
number (and also V 9 Enumeration, V 10 number names to 5, V 13 A rule for 
counting) were measured in this task. Children were shown two cards each 
containing two parallel lines of seven and nine stars respectively. Each card was 
presented with one line of stars covered. Children were asked to count the uncovered 
line and tell how many. Note was made of enumeration procedure, number names in 
order and the presence of a rule to determine the cardinal number of a set by 
counting. The second line of stars on the card was uncovered and the child was 
asked to tell quickly, without counting, how many. If children could do this for both 
cards and give an explanation that indicated one-to-one correspondence (e.g., 
they've all got partners) they were assessed as knowing that equivalent sets, in one- 
to-one correspondence, had the same cardinal number. 


V 15 Relative magnitude of small numbers (4 > 3, 7 > 6). The children were 
shown, for example, first a card with a photograph of three then one with four 
familiar objects. They were asked how many on each card. The cards were removed 
and they were asked if three was more than four or vice versa. To succeed on this 
task the children were required to give the correct answer and explain adequately 
(e.g., there is one more in four). 


V 16 Recognition of equality/inequality of length. This task was used to assess 
knowledge of quantitative terms for length; long, longer, short, shorter, same (or 
equivalents) and ability to distinguish objects of equal or unequal length. The 
children were shown 10 by 2 cm, 10 by 3 cm and 10 by 4 cm pieces of liquorice. A 
piece was selected and put on the table and the child was asked to find a longer or 
shorter piece or one of the same length. Children succeeded in this task if they found 
all three pieces as directed, and described them. 


V 17 (a) Ordering of lengths pair by pair and/or V 17 (b) Seriation. Children 
were shown five cylindrical rods (diameter 1-8 cm), of different colours, from 8 to 5 
cm in length with *75 cm differences between rods, standing in a timber base. The 
rods were discussed then tipped onto the table and children were asked to put them 
back as they had seen them. Children were assessed as succeeding if they put all the 
rods back in order whether they used pair by pair comparisons or a seriation strategy 
(cf. Inhelder and Piaget, 1964). 


V 18 Construction of horizontal, V 19 diagonal lines straight lines and (V 20) 
recognition of a straight line (cf. Halford and MacDonald, 1977). Children were 
shown two boards 17 х 17 cm each divided into 25, 3:4 х 3:4 squares. The 
experimenter made a straight horizontal line of blue counters on one board. 
Children were asked to duplicate the line on the other board using red counters. The 
procedure was repeated for a diagonal line from the top left to lower right hand 
corners. Children were also asked to correct, if necessary, an experimenter's not- 
straight line. Children succeeded if they duplicated and corrected the lines 
successfully. 


V 21 Recognition of length invariance (without, explanation) and V 22 
conservation of length invariance. Children were shown two rods of 15 cm (red and 
yellow) and one rod of 14-5 cm (blue) adjacent, parallel and with left ends aligned. 
They were asked which rods were the same and different. The rods were then placed 
in pairs in two configurations to produce horizontal/vertical illusions. First, the red 
rod was placed in a vertical position perpendicular to the mid-point of the yellow 
rod. Second, the yellow rod was placed in a horizontal position perpendicular to the 
mid-point of the red rod. The same procedure was repeated for the blue and yellow 
rods. Children succeeded on V 21 if they said, without explaining, that the first two 
pairs were the same and the second different. They succeeded on V 22 if they gave a 
conserving explanation. 
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V 23 Correct response to transitivity task and V 24 transitive reasoning (cf. 
Halford and Galloway, 1977). Three rods, red 32:5 cm, green 32 cm and blue, 
31:5 cm, were used. Children were shown the red and green rods parallel with left 
ends aligned. They were asked which was longer. The procedure was repeated for 
the green and blue rods. The rods were removed and the child was asked whether the 
red rod was shorter or longer than the blue one. Memory of original comparisons 
was checked by placing the three rods on the table in random order with non-aligned 
ends covered and the other ends aligned. If children explained that red was longer 
than blue because red was longer than green and green was longer than red then they 
succeeded on V 24. If they remembered the comparisons and made a correct 
response but gave a non-inferential or no explanation they succeeded on V 23. 


V 29 Measure with a standard device, a 30 cm ruler (without explanation) and V 
30 (with explanation). Children were shown a 30 cm ruler marked and numbered in 
cm units. They discussed the 30 em ruler and were then asked to measure separately 
a red 28 cm and a patterned 27 cm length of ribbon. If children said that the red 
ribbon was longer than the patterned ribbon by showing how far each went on the 
ruler (or a similar response) they succeeded on V 29. If they explained that the red 
ribbon was longer than the patterned one because one was 28 cm and the other 27 cm 
they succeeded on V 30. 


V 34 Counting to 10 with a rule. This variable was the sum of V 9 enumeration, 
V 11 number names and V 13 a rule for counting, as described earlier. 


V 35 Length measuring strategy (Sum of V 25, V 26, V 27 and V 28) using 
arbitrary units. Children were shown five pairs of ‘‘roads’’ one after another. Each 
“тоад’’ was made of a number of black strips of equal length stuck end on to a 
white backing. Each of the first pair of roads had the same number of sub-sections 
of the same length in straight lines (V 25). The length and/or configuration (zig-zag) 
on V 27 and V 28 and the number of sub-sections (V 26) differed for one of each of 
the other pairs. Children were provided with unit measuring strips of the same 
length as the units of the first road in each pair. Children were asked for each pair if 
a toy car had as far to travel on each road. To succeed on each task the child had to 
explain that the lengths of the roads were the same or different and explain why in 
terms of the length and number of units in each road. 


RESULTS 


The results are summarised in Figure I. Figure 1 is a model of the sequence of 
development of length and number knowledge in children aged 3 to 7 years. It was 
constructed from analyses of the data through the SPSS Guttman (Nie et al., 1975) 
and the Guttman-Lingoes Multiple Scalogram Analysis (MSA, Lingoes, 1973) 
scaling procedures, cross-tabulation of the variables item by item, and phi and 
McNemar chi-square statistics for those tables (Siegel, 1956). The phi value was used 
to confirm the sequence or co-occurrence of variables. The McNemar chi-square 
statistic showed where significant one-way associations existed between pairs of 
variables. In addition the placement or juxtaposition of variables in the sequence 
and the distance between each pair was determined by the number of correct 
responses to each item. The sequences for length and number are the Guttman 
scales. The variables included in parentheses were scaled by the Guttman but not the 
MSA procedure. The coefficients of reproducibility for the Guttman length and 
number scales and the MSA scales were above 0-9 and hence the scales were valid. 
The coefficients of scalability were above 0-6 and hence the scales were 
undimensional and cumulative. 
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It can be seen from Figure 1 that the variables form three or four clearly 
identifiable groups. Variables 18, 16, 17, 9, 8 and 10 are the first concepts in the 
sequence. There was no significant chi-square value between any of these variables 
which would indicate strong directionality although the scalogram analyses gave a 
sequence for these skills. It was hypothesised from the literature review that children 
from 3% years approximately would succeed on all of these tasks. The hypothesised 
order of development from the analysis of the information processing demand of 
these tasks was that they required Level 1 reasoning and an M-space of e + 1 or 2. It 
was hypothesised that subitizing and some knowledge of number names in sequence 
would be the earliest concepts to develop. However, to construct a straight line and 
cognise a binary relation between two lengths probably require less knowledge of 
learned symbols than subitizing or number names in order, and should perhaps have 
been expected to develop earlier. 


The second group of variables includes 21, 34, 15, 22, 23 and 19. It was 
predicted from the literature review that Variables 21, 34, and 23 would be possible 
some time after age 4, that Variable 19 would be possible by about 5 years of age, 
Variable 15 at 51^ years and Variable 22 at 6 years. Variables 21, 34, 23 and 15 were 
considered to be Level 1 tasks with an M demand of e + 2 and Variables 22 and 19 
were considered to be early Level 2 tasks with an M demand of e + 2 or 3. There was 
a clear one-way association between the last variables in the first group (17 and 8) 
based on the chi-square value, and the first variables in the second group (21 and 
34). This second group appears to consist of late Level 1 and early Level 2 tasks. 


The third group of variables includes 29 and 14. Variable 29 occurs much later 
than predicted from the literature review or from the analysis of demand of the 
tasks. It probably depends on integration of knowledge of number, and of the use of 
a ruler as a standard device, as well as the ability to cognise a binary relation and 
respond correctly to a transitivity type task. Variable 30 develops at about the same 
time as Variables 29 and 14 but a little before Variable 35. All four variables (29, 14, 
30 and 35) precede variable 24 which is probably in a group on its own. It was 
hypothesised in the literature review and the analysis of task demand that Variable 
24 would develop before Variable 30 and 35 rather than after them and that they 
would depend on Variable 24. 


The sequences shown in Figure 1 differ quite markedly from the logico/ 
mathematical order of knowledge of length measuring. In particular it was posited 
in the logico/mathematical analysis that transitive reasoning should precede the 
ability to conserve length, to measure with arbitrary units, and to measure with a 
ruler. Measuring with arbitrary units should precede the use of a ruler. It was also 
asserted that the ability to make use of one-to-one correspondence should precede 
counting with a rule and knowledge of comparative magnitude of numbers. As can 
be seen from Figure 1, empirically this is not the case. 


The sequence of the variables in Figure 1 was most like the sequence predicted 
from the analysis of the information processing demand of the tasks. The only 
significant exceptions were variables 29 and 24, both of which are apparently 
learned later than predicted. 


A multiple regression analysis was also computed. Multiple R (with M-space 
and short-term memory variables as predictors) confirmed the division of variables 
into groups as shown in Figure 1 on the basis of the relative strengths between 
Multiple R and the length and number variables. 


Factor analysis was used mainly to determine whether increasing levels of 
capacity to process information would appear as separate factors. The procedure 
chosen was the SPSS R-factor analysis (Nie ef al., 1975) based on correlation 
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between variables. Principal factoring without iteration, and the oblique rotation 
procedure with Kaiser normalisation were used. Factor 1 accounted for increasing 
capacity with age and contributed most highly to those variables which required 
binary operations. Factor 2 accounted for lower level comparisons probably 
dependent on perception. Factor analysis confirmed the covariation of the variables 
within the ordering shown in Figure 1. 


The variables were also cross-tabulated with sex, measures of M-space, short 
term memory and age. Cross-tabulations by sex showed no significant difference in 
performance or any of the tasks between boys and girls. The other cross-tabulations 
showed success on certain tasks at an earlier age than predicted from the literature 
search or from analysis of the information processing demand of the tasks. When 50 
per cent levels of success were taken for age, M-space and short-term memory the 
resulting sequences were similar to Figure 1. The sequences most like that in Figure 1 
were those derived from measures of capacity to process information. 


DISCUSSION 


The data from this research showed quite clearly that the sequence for 
acquisition of length knowledge is closely related to developing capacity to process 
information. Increasingly complex knowledge of length measuring was predicted 
and shown to occur with increasing M-space and short-term memory. The children 
younger than 6 years in the sample had no direct instruction in measurement 
concepts. The older children may have benefited from school instruction to the limit 
of their capacity. Individual differences could also have occurred because of 
learning styles and environmental opportunities but these did not alter the general 
sequence. 


It can be seen from Figure 1 that there are levels of knowledge for concepts such 
as invariance of length and cardinal number. They do not develop in an all or none 
fashion. Rather levels of knowledge of those concepts appear to be closely related to 
capacity to process information. 


Because the sequence shown in Figure 1 is most like the sequence predicted 
from the analysis of the information processing demand of each task it is asserted 
that the theories of Case and Halford are potentially very useful for curriculum 
planning. A suggested procedure for choice and sequence of curriculum content 
would be: 


* select curriculum content on the basis of a logico/mathematical analysis 


* analyse content in terms of the mathematical concepts and structures that it 
contains 


* attempt to determine the demand that each aspect of content will make on 
capacity to process information 


* match that demand to M-space and short-term memory norms 


* then plan to present each concept in a manner and at a level that is consistent 
with the expected information processing capacity of each child. 
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A MODEL OF THE COGNITIVE MEANING OF 
MATHEMATICAL EXPRESSIONS 


By PAUL ERNEST 
(School of Education, University of Exeter) 


Summary. The central focus of the paper is the nature of the mental representations of 
the meaning of the linguistic expressions of mathematics. An information-processing 
model for the construction of mental representations is presented. The model provides 
syntactical tree structures as meaning representations. It is argued that the major function 
of written mathematical language in school mathematics is in the initial presentations of 
pupil tasks. An information-processing model for the performance of routine 
mathematical tasks is proposed. The central feature of the model is the major role that 
transformations of mental representations play. The overall model is related to children’s 
conceptual development, and a series of stages in the acquisition of mathematical 
language is proposed. Finally, the model is shown to be consistent with a range of current 
concepts, theories and observational data. 


INTRODUCTION 


A CENTRAL feature of mathematics is its characteristic formal symbolism. Children 
devote a great deal of their time in school to mastering the symbolism of 
mathematics. In understanding written mathematical expressions evidently children 
must form mental representations of these expressions, or at least of their meanings. 
However, this account raises a number of questions. 


What are the cognitive processes by which the meaning of a linguistic 
expression of mathematics is apprehended? What types of processes and 
representations are involved and what are their characteristics? Can the process by 
means of which the meanings of mathematical expressions are represented be 
modelled? And, finally, how are such meanings used by learners? 


In this paper partial answers to these questions are proposed in the form of a 
tentative model of meaning for compound mathematical expressions. The model 
focuses on the mental representation of the syntactical structure of compound 
mathematical expressions. It leaves aside the issue of the meanings of individual 
denotative symbols which can include concepts, images, etc. 


The model assumes the syntax of formal mathematics, which is implicit in 
expressions such as 2+3, 7x 11 2 77 and 7x+2=16. 


The structure of languages suitable for expressing mathematics has been 
extensively researched, and there is a consensus that the simplest appropriate 
languages are the first-order predicate languages. These are treated in standard 
mathematical logic texts such as Church (1956), Lightstone (1964), Mendelson 
(1964), Schoenfield (1967), Smullyan (1968) and Bell and Machover (1977). 


A brief formulation of a first order predicate language L is as follows. 
The basic symbols of L consist of sets of: 


constants (e.g. one, 2, 4, 0-7, e) 
individual variables (e.g. X, y, Z) 

n-place functions (e.g. +, —, X) 
n-place predicates (e.g. 2, X) 

logical connectives (e.g. ~, A) 

quantifiers (e.g. V, 3) 
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The terms of L are defined inductively as follows: 


the basic terms are the constants and the variables. If t,, . . ., t, are terms and f is 
an n-place function then ft, ... t, is a term. 

The formulas of L are defined inductively as follows: 
If t... ., t, are terms and P is an n-place predicate, then Pt, . . . t, isan atomic 


formula. If A and В are formulas, and x is a variable then ~A, A>B, A AB, 
А vB, АВ, VxA, Зх A are all formulas. 


The notions of free and bound occurrences of variables, closed terms and 
sentences are defined for L as usual, for example see Bell and Machover (1977). 

The introduction of parentheses and definitions such as 3 + 5, a=b, for +35, 
—ab, respectively, permits the use of the usual linguistic conventions of 
mathematics. 

The language L provides a precise specification of the grammar of mathematics 
which applies to the symbolic mathematical expressions met by children in school, as 
well as to the abstruse formulas of mathematicians. The model of meaning 
representations that follows will posit cognitive structures or representations 
corresponding to these external strings of symbols. 


A Model of the Cognitive Meaning of Mathematical Expressions 

The central feature of the model is an iterative procedure for analysing the 
syntactic structure of a mathematical expression, and representing it as a partially 
ordered hierarchical structure, that is a tree. Thus, for example, the syntactical 
analysis trees for the term *'(7-- 2) x 3" and the open sentence ''2x - 3 x9" are 
shown in Figure 1. 


FIGURE 1 
EXAMPLES OF SYNTACTIC ANALYSIS TREES FOR A TERM AND AN OPEN SENTENCE 


(7+2)х3 2x+3=9 
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A syntactical analysis tree (SA tree) is defined as follows. Formally, an SA tree 
is made up of a tree, a set of linguistic entities and a function from the set of nodes 
of the tree to the set of linguistic entities. The underlying tree is a topological tree 
defined as usual, for example in Wilson (1972) or Smullyan (1968). The tree is a 
connected and directed graph made up of a finite number of nodes and edges, with a 
designated initial node. The direction of the edges gives rise to relations of 
immediate succession and succession on the set of nodes. A node without any 
successors is termed a terminal node. For convenience and since no confusion can 
arise, the linguistic entity which is associated with a given node (via the above 
mentioned function) will be identified with that node. 


The concept of a partial SA tree is defined by induction. A single term or 
formula of L is a partial SA tree. A partial SA tree can be extended by the 
application of one of the rule types shown in Figure 2. During the application of a 
rule a single terminal node is replaced by a new node to which is connected at least 
one new terminal node. The result of a rule application is also a partial SA tree. A 
full SA tree is defined to be a partial SA tree to which no further rules can be 
applied. Clearly the terminal nodes of a full SA tree will all consist of basic symbols 
of L. 


The process of constructing a full SA tree requires the construction of a 
sequence of partial SA trees beginning with the initial expression or node alone (the 
minimal SA tree). Each tree is derived from its predecessor by a single rule 
application. Such a sequence is termed a SA sequence for the expression associated 
with the initial node. There may be more than one SA sequence for any given 
expression. 


It is now possible to sketch the proposed theory of cognitive meaning. The 
process of understanding a mathematical expression (shown in flow diagram form in 
Figure 3) is as follows. The mathematical expression is visually scanned by the 
reader whose gaze may rest on the expression for a while. A representation of the 
surface structure of the expression is formed. This representation is checked for 
understanding, which involves checking that all of the symbols are known and 
checking that the complexity or length of the expression is manageable. If either of 
these two tests are failed, the procedure is aborted and referred to a decision-making 
mental executive function. Otherwise, the syntactical analysis procedure (illustrated 
as a flow diagram in Figure 4) is called and executed. In this procedure, the main 
operator of the expression is located and identified. The appropriate SA rule is 
applied to the expression. This results in a new partial SA tree, the successor in a SA 
sequence. If any sub-expressions remain unanalysed, one is chosen by means of 
some strategy (for example, choose the leftmost of the most recently produced sub- 
expressions) and the procedure is reiterated. Otherwise, a full SA tree has been 
constructed. Next the meanings of the basic symbols (the nodes) of the SA tree are 
located in long-term semantic memory. The representation of the deep structure of 
the expression is now constructed, with the meaning structure unpacked and the 
meanings of the basic syntbols activated or ready for activation. The representation 
can now be used, either as the starting point for the solution of a routine problem or 
simply for the purposes of understanding and as an addition to previous knowledge. 


This is the proposed model for the understanding of mathematical expressions 
in its most general and simplest form. It provides a model for the generation of 
cognitive representations of deep structures, the meanings of mathematical 
expressions. 


It is not intended to suggest that the full model presented above, and indeed the 
full language L are fully available in the minds of young learners. On the contrary, 
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FIGURE 2 
RULES FOR EXTENDING A PARTIAL SYNTACTIC ANALYSIS TREE 
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START 
meaning extrac- 
tion procedure 







Scan expression 
visually 










Form surface 
representation 







Are 
symbols 
known? 






Abort meaning 
extraction 
STOP 








Call syntactic 
analysis 
procedure 











EXECUTE 
Generate full SA 
tree with SA 
procedure 












Locate basic 
symbol meanings 
in semantic 
memory 










FINISH 
Deep structure 
representation 

available for use 







PROCEDURE FOR EXTRACTING MEANING 
FROM A MATHEMATICAL EXPRESSION 


347 


348 


Mathematical Expressions 


FIGURE 4 
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the sub-languages of L grasped, the subsets of SA rules available and the complexity 
of linguistic expressions which can be represented will all increase as the learner 
develops. These matters will be further treated below. 


Transformations of Representations 

Cognitive scientists are growing increasingly interested in the mental 
representations of meanings that are built up during reading. Tree-like hierarchical 
models of these representations are proposed by Kintsch (1978), Anderson (1983) 
and others. There is evidently an analogy between such models and that proposed in 
the previous section. However, the functions of these models are not the same. Deep 
level representations built up during reading allow the extraction of salient 
components, which contribute to the construction of a larger schema or meaning- 
context. Thus, representations are made for the purposes of comprehension in the 
broad sense, which is the construction of a larger meaning context (Scardamalia and 
Bereiter, 1984) and the building of schemas and hypothesis testing (Baker and 
Brown, 1984). This is not the case with younger and poorer readers, who focus on 
reading as a decoding process rather than as a meaning-getting process, according to 
research studies cited in Baker and Brown (1984). 


In contrast, the routine presentation of mathematics expressions in the 
classroom is not usually a comprehension exercise contributing to the construction 
of a larger meaning context. Routinely, mathematics expressions are presented as 
the initial state of a task. As such, a mathematics expression is without the richness 
of context of a read story, for example. Rather, it is an independent task 
presentation, at best one of a sequence of similar exercises. 


Although the exact proportion will vary according to a number of different 
factors, a significant proportion of the time used for the learning of mathematics is 
spent on routine mathematical tasks. Many, perhaps most of these tasks, are 
devoted to the transformation of one expression into another. Consider arithmetical 
computations. A whole number addition task concerns the transformation of a 
complex numeral such as 34- 5, 19+ 27 or 21 +723 + 972 into a simpler numeral (8, 
46 and 1716, respectively, in the examples) Whole number subtraction, 
multiplication and division tasks similarly concern the transformation of numerals. 
So do the four operations applied to (vulgar) fractional numerals, decimal 
(fractional) numerals, and integer numerals (directed number symbols). In addition 
to these computational transformations, arithmetic abounds with simplifications 
and conversions including denary numeral expansion, the equivalence and 
simplification of fractional numerals and of ratio expressions, the inter-conversion 
of numerals for vulgar fractions, decimals and percentages, and the conversion of 
numerals (whole number and decimal) into standard form. 


In the realm of measures, expressions in different units are inter-converted 
within the topics of length, area, volume, capacity, time, money, angle, etc. 
Transformations are very important in algebra too. Much time is devoted to the 
manipulation and simplification of algebraic expressions. Algebraic equations and 
inequations are solved by sequences of transformations. Trigonometrical terms and 
equations are similarly treated. Mathematical proofs rely centrally on the 
transformations of expressions in accordance with formal rules. 


This list provides an indication of the extent to which transformations pervade 
school mathematics, via tasks which begin with a written mathematical expression 
and which require a transformation for their completion. The list does not include 
spatial as opposed to linguistic transformations (e.g., topological equivalence, 
symmetry transformations, motion geometry nor does it include the 
transformations required in problem solving (translation of a practical task, 
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situation or verbal problem into a mathematical expression). It would not be too far 
from the truth to describe school mathematics as the study of equivalence relations 
(for most of the transformations concerned generate equivalence relations) in the 
domains of number, measure, algebra and geometry. 


The central point made is that mathematical expressions are commonly 
presented, not for purposes of comprehension, in the psycholinguistic sense of 
contributing to the construction of a larger meaning context, but as the initial state 
of a mathematical task which will be transformed in the performance of the task. 


Tasks can be performed at the surface level by the manipulation of 
mathematical expressions in accordance with surface rules or algorithms. However, 
tasks can also be performed at the deep level by the transformation of the mental 
representations of expressions. The deep level transformations employed may 
mirror surface transformations precisely, such as commutativity of whole numbers 
relative to addition, or not, as in the use of a counting on strategy to perform a 
mental subtraction task such as 101— 89. One possible counting on strategy for 
performing this task is as follows: 89 plus 1 makes 90, 10 more makes 100 (total 11 
so far), one more makes 101, total 12. Whether the deep level transformations 
mirror surface transformations or not, the point made here is that the major use to 
which the deep representations of mathematics expressions are put is as starting 
points for transformations moving towards a goal state. 


There are a number of such transformations, for example the commutativity 
transformations for addition terms. This is shown in Figure 5, with some further 
transformations of arithmetical terms. 


When this transformation is applied to the deep level representation of a term 
of the form a+b, the result is a term of the form b+a. This is probably the first 
deep level mathematical transformation learned by children during the normal 
course of schooling. Resnick and Ford (1981) report experimental (chronometric) 
evidence that commutativity is used in the third of three stages in the development of 
whole number addition. The existence of these three stages is confirmed by the 
longitudinal study of Carpenter and Moser (1984). A child at the first stage 
computes a+b by counting a steps and then counting b steps, to arrive at an overall 
count (the count all strategy). At the second stage, a child counts on b starting from 
a. At the third stage, a child computes a + b or b +a, whichever is the most efficient, 
by counting on MIN(a, b) steps (the lesser of a and b) from MAX(a, b), the greater 
of a and b. (Note that when a— b the empirical evidence suggests that the value of 
a+b is retrieved from memory.) 


According to the present model a child at the third stage is able to apply a 
commutativity transformation to the representation of a+b, and then applies the 
count on strategy to evaluate the representation of a+b by the representation c, 
where c=a+b. 


In addition to the commutativity transformation, Figure 5 shows associativity, 
identity and distributivity transformations which are brought into play by learners, 
as are the corresponding transformations for multiplication, and the other 
distributivity transformation. It is not proposed that these transformations are 
learned from formal arithmetical axioms. It is not certain that exposure to such 
statements will facilitate a child's construction of these transformations. Rather it is 
proposed that these transformations are rules which are induced by the learner from 
a number of experiences in using deep level representations of number expressions to 
carry out calculations, just as seems to be the case for the commutativity 
transformation (for addition). 
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FIGURE 5 
TRANSFORMATIONS OF THE REPRESENTATIONS OF ARITHMETICAL TERMS 
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In contrast to this view, Hope (1985) reports the acquisition of a transformation 
(the rule for factorising the difference of two squares) by an expert calculator Aitken 
on the basis of a single remark of his teacher. It is likely, however, that expert 
calculators are unusually receptive to learning such transformations because of their 
passion for mental calculation (Hope, 1985). 


Chunking 

The construction of deep level representations of mathematical expressions 
(syntactical analysis trees) and the transformation of these representations take 
place in short-term memory. This memory has a limited number of s/ots for holding 
units or chunks of information (representational units) for processing (Miller, 1956; 
Bell et al., 1983). The ability to process increasingly complex representations, which 
develops during childhood, is accounted for by chunking. This is the combination of 
a number of connected items of information into a single chunk which, despite its 
complexity, only occupies a single slot in short-term memory (Resnick and Ford, 
1981; Bell et al., 1983). 


In the context of the present proposals, the use of partial syntactical analysis 
trees as deep level representations of mathematical expressions can be accounted for 
by the phenomenon of chunking. A learner experienced at constructing deep level 
representations for a class of complex expressions, for example multi-digit 
numerals, will become able to represent one of these expressions as a single chunk. 
Thus, a multi-symbol expression, or part of an expression, can be represented at the 
deep level by a single representational unit, a chunk. This representation will no 
longer mark an incompletely analysed complex, as it did at an earlier stage in 
development. Rather it is a meaningful deep level representation in its own right, 
albeit one which can be further analysed. 


Investigations into the use of chunking in the memorisation and processing of 
chemical formulas and equations are reported in Johnstone and Kellett (1980). The 
findings suggest that novices failed to memorise the structural formula for methyl- 
ethyl-ester, for example, because it is made up of 14 letters and 13 bonds, more units 
of information than the 7+2 that short-term memory holds. Masters, however, 
succeeded at this task because they memorised the task in three chunks, well within 
the limits of memory capacity. 


In mathematics learning, chunking can also serve to reduce the demands on 
short-term memory. A dramatic account of the use of chunking to improve memory 
span on digit recall tasks is provided by Chase and Ericsson (1981). Chunking can 
simplify the representations of mathematical expressions, for example, linear 
equations. Figure 6 illustrates two representations of a linear equation, one in full 
and the other in chunked form. 


Initially, the equation is represented by a full SA tree. This includes the 
representation of the two place function a — b by a+ Neg(b), where Neg is the one 
place function Neg(x) - —x. With experience, the sub-expressions 2x and —7 are 
each represented as a single chunk, as is illustrated. 


The symbolism of mathematics is very economical, and the limited number of 
basic symbols used to build compound terms assists in their representation as single 
chunks of information at the deep level. In compound numerals, the linear 
concatenation of digits alone suffices to symbolise the addition and denomination 
functions. Concatenation also symbolises multiplication (in algebra), addition (in 
mixed rational numbers), function application and predicate application (in 
predicate logic), a point also made by Skemp (1982). This economy of symbolism 
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FIGURE 6 
FULL AND CHUNKED SYNTACTICAL ANALYSIS TREES 


FULL SA TREE FOR CHUNKED TREE 
EQUATION '2x -7 x 1^" FOR SAME EQUATION 


facilitates and encourages chunking, since the operation expressed is implicit i ina 
string of symbols. It also makes the meanings of mathematical expressions context 
dependent. One consequence is that the SA rules which are applied during the 
construction of a deep level representation of an expression are not wholly 
determined by the symbols in the expression. The context of the expression, the 
frame, to use the term of Davis (1984) and Minsky (1975), will play an essential role 
in determining which SA rules are applied. 


The partial unpacking which occurs in the construction of SA trees as a result of 
chunking is an abbreviation of the process of deep level representation. The more 
general tendency for both representations and procedures to become curtailed has 
been identified as a feature of the development of mathematical thinking by 
Krutetskii (1976) and Vergnaud (1986). 


The process of curtailment also leads to the development of additional SA 
rules, which result from the combination of SA rules and transformation rules. An 
example is provided by the rule for the representation of a term of the form 
a+b+c+d. Just as the lack of parentheses in the term reflects the associativity of 
addition, so too the associativity transformations enable the term to be represented 
by four addends operated on by the two place addition operator (several times). The 
action of the several addition operators is to create what might be termed an additive 
field (by analogy with a force field) acting on the addends. This is represented as a 
multi-place summing function (£), which is shown acting on the addends (a, b, c and 
d), in Figure 7. This rule results from the function rule (for +) and the associativity 
transformation. Sleeman (1984) also proposes a summing function which he terms a 
plus bag. However, Sleeman regards this operator as likely to be part of 
inadequately completed representations, rather than as the result of the abbreviated 
but correct representations that result from chunking. 
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A multiplicative form of the summing operator is also proposed (in the present 
model), in view of analogous commutativity and associativity transformations for 
multiplication. Figure 7 also shows two permutation transformations which can be 
applied to the results of applying the compound addition analysis rule. 


FIGURE 7 
DERIVED SYNTACTICAL ANALYSIS RULE AND TRANSFORMATIONS FOR TERMS OF THE FORM a-F b --c4- d 
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Additional rules for transforming chunked representations develop. These 
include a number of transformations of the representations of equations, which are 
shown in Figure 8. Both of the transformations illustrate the parity of treatment of 
the two equated terms within an identity. In the subtraction transformation negative 
three is added to each side (in the form of the simpler equivalent subtract 3). In the 
halving transformation, the halving operator (a one place function h(x) defined 
h(x) 2 x = 2) is applied to both sides. 

Both chunking and other forms of curtailment result in modifications to, and 
increased numbers of rules of analysis for representation and transformation. Only 
samples of these rules have been illustrated. Further and possibly more individual 
rules are developed, especially for the purposes of facilitating computation. These 
rules include substitutional transformation rules in which one numeral repre- 


FIGURE 8 
EXAMPLES OF TRANSFORMATIONS APPLIED TO THE REPRESENTATIONS OF EQUATIONS 
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sentation is replaced by another as, for example, when 10— 1 is substituted for 
9 in 1509, resulting in the transformed computation 150 x (10— 1). Substitution 
transformations like this replace operations by one or more other operations which 
are perceived to be simpler. In the example X9 is replaced by X10, x 1 and — 150. 


The rules which develop may also include compensatory transformation rules in 
which two mutually compensating transformations are made, as for example in 
transforming the representation of 820 x 25 to that of 1⁄4 x 820x 100. In Hope (1985) 
a more extensive range of methods used for mental calculation is categorised. Whilst 
his focus is the range of methods used by experts, including calculating prodigies, 
many of the simpler transformations are used more widely, such as the two above 
types of transformations, which are classified as subtractive distribution and aliquot 
parts, respectively. What seems to distinguish experts from the less expert is interest 
and practice in mental calculation. A consequence of extensive practice is the 
development of an increased range of computation facilitating transformations. 


One further class of transformations needs to be mentioned. These are the 
evaluation transformations which replace a compound numerical term by its value. 
Given such a term, a procedure for constructing its value is called and executed, and 
the resulting value (correct or not, as the case may be) is substituted for the original 
term in the representation. 


PERFORMING ROUTINE TASKS 

A model of the procedure for performing routine linguistically presented 
mathematical tasks can be constructed with the aid of the previously defined 
concepts. The procedure is shown in Figure 9 in flow diagram form. A learner, faced 
with a routine task presented in the form of a mathematical expression, calls up and 
executes a meaning extraction procedure, as described earlier (see Figure 3). At the 
same time, the learner is trying to comprehend the task in general, and this requires 
the determination of the frame of the task (Davis, 1984). Contextual factors, such as 
task instructions given verbally by a teacher or as part of the written task 
presentation, can help to determine the frame, as can the previous tasks undertaken, 
if any. The symbols in the task presentation also provide clues during the meaning 
extraction procedure, as does the meaning itself. If the frame of the task is 
determined, then appropriate schemas can be activated or called. Otherwise, the 
task cannot be routinely attempted, in which case an executive problem solving 
function is activated. 


In addition to identifying the frame of the task, the comprehension of the task 
requires the identification of its goal. This may be a separate activity, or could be 
provided by the frame. When the frame and goal are identified, the next step is the 
identification and retrieval of a stored plan for carrying out the task, which may 
again be assisted by the frame. Just as in the case of the frame, if either the goal of 
the task or a plan for its execution cannot be identified, then the problem solving 
executive is alerted. 


In the case where there is a failure to identify the frame, goal or plan for a task, 
then the intervention of the problem solving executive is required. In this case, the 
task can no longer be described as routine (relative to the given learner), but as a 
problem to be solved. 

The solution of a problem task requires the comprehension of the task, which 


may be the subsumption of the task to a more general frame, as well as a greater 
conscious effort in the identification of the goal of the task. 


The crucial function for the problem solving executive, once the problem is 
comprehended, is the construction of a plan for carrying out the task. If a particular 
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TT. FIGURE 9 
Task performance procedure / PROCEDURE FOR PERFORMING ROUTINE (WRITTEN) 
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plan is constructed and used repeatedly for a given class of problems, it will be 
stored in long-term memory and can be retrieved for the performance of routine 
tasks. In this way, it may contribute to the development of a new frame. 


A plan for performing a routine task will typically consist of a sequence of 
subgoals, which leads to the overall task goal. These subgoals may have associated 
pointers which indicate transformations which can be used for their attainment. 


In the performance of a routine task, a plan is identified, called and executed. 
The execution of the plan involves the determination of the appropriate subgoal, 
and then the selection of an appropriate transformation of the meaning 
representation of the task expression to achieve the subgoal. When the selection has 
been made, the transformation is applied to the meaning representation. During the 
application of the transformation, the effort and difficulty involved are monitored. 
If these are too great, then a decision is made as to whether the attempt to attain the 
subgoal should continue. If the attempt is continued, an alternative transformation 
is sought. After the meaning representation has been transformed, the result is 
checked to see if the subgoal has been achieved. If it has, then the next subgoal of 
the plan is determined and the cycle repeats, unless the overall goal has been 
achieved. 


This completes the description of the proposed model for the performance of a 
routine task. In achieving its goals, such a procedure operates smoothly, moving 
from one stage to the next. In practice, some of the stages represented sequentially 
may occur simultaneously. An attempt has been made to symbolise this in Figure 9 
by attaching decision diamonds to instruction boxes. 


A central feature of the task performance procedure, in addition to the 
previously treated meaning extraction procedure and meaning transformations, is 
the role of the executive function. This meta-cognitive function monitors the 
processes and makes decisions as well as solving problems if required. 


In Figure 9, the decision boxes (diamonds) represent executive decisions based 
on a metacognitive monitoring of the progress of the procedure. Some of the 
negative outcomes of these decisions reflect a breakdown in the smooth 
performance of the task and require the problem solving executive. This 
metacognitive function probably requires increased consciousness, as Brown (1978) 
proposes, as opposed to the more automatic functioning of procedures and 
transformations. 


Although the term metacognition has been applied in different contexts with 
different meanings (Yussen, 1985; Kilpatrick, 1986), there is substantial agreement 
that it applies to knowledge, planning and control of cognitive processes, as in the 
present paper. The planning involved in problem solving proper is a metacognitive 
activity (Flavell and Wellman, 1977; Brown, 1978; Baker and Brown, 1984; Yussen, 
1985). The monitoring and control of cognitive processes, such as is required in the 
task performance procedure, is also a metacognitive activity (Campione and Brown, 
1978; Flavell, 1979; Baker and Brown, 1984; Bialystock and Ryan, 1985; Yussen, 
1985). Thus it seems that the task performance procedure can be seen as taking place 
at two levels. The execution of stored plans, routines and procedures takes place at 
the cognitive level. The monitoring and control of these processes takes place at the 
metacognitive level. From the nature of the activities involved, the increased 
requirement of consciousness at the metacognitive level (Brown, 1978) seems highly 
plausible. 


The author has had introspective experience of mental calculation consistent 
with (this model of) the task performance procedure. Given a computation to 
perform mentally, I formed a mental representation of the compound numeral 
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expressing the task, and chose a transformation. On applying the transformation, I 
experienced ‘‘resistance’’ and perceived that an excessive effort (more than seemed 
necessary) was involved. I chose an alternative transformation, applied it 
successfully, and reached the goal state. Only after applying the second 
transformation did I become aware of the whole procedure to the extent that I 
consciously retraced the process, thereby creating an enduring memory of the event. 
This episode illustrates the metacognitive monitoring of mental effort, the triggering 
of a change of attack when a high level of effort was recognised and the choice and 
application of an alternative transformation. 


The experience provides confirmation of the task performance procedure 
proposed above, although clearly other models or procedures could also account 
for it. 


The example cited consists of a purely mental procedure for performing a 
routine task. Other routine tasks may involve the recording of mathematical 
expressions at various points during their performance. Davis (1984) refers to this 
type of performance as a visually moderated sequence. According to Davis, one of 
these sequences begins with a visual cue V,, which elicits a procedure P whose 
execution produces a new visual cue V,, which elicits a procedure P,, . . . and so 
on. In terms of the present model, the sequence of visual cues V,, V,,. . . are the 
written representations of the transformed meanings, which are constructed when 
subgoals are obtained. The parallel sequence P,, P,,. . . represents the sequence of 
transformations called and executed for achieving subgoals, although a single P, 
could represent a subsequence of transformations performed one after the other. A 
visually moderated sequence allows meaning representations corresponding to 
completed subgoals to be represented symbolically, removing the need for the 
representation (and previous steps in the procedure) to be stored in short-term 
memory. 


The use of visually moderated sequences and chunking may lead to the 
development of different forms of processing closer to the surface level. The 
meaning representations of a task performed in this way may only be partially 
constructed. Not only will sub-expressions be represented by chunks, but only part 
of the expression may have its meaning represented at the deep level, with the 
remainder of the expression visually available (in surface form) for analysis, but not 
represented at the deep level throughout the process. 


_ A procedure of this type provides an explanation for the performance of 
written (external) algorithms, such as the subtraction algorithm for multi-digit 
numerals, in which internal representations are only formed for subsets of the 
symbols. 


_ In both the use of visually moderated sequences and in the performance of 
written algorithms, the transition between stages would seem to be a function of the 
metacognitive executive. 


Linking the model to cognitive development 

A model has been proposed for the mental-representation of meaning, the 
transformation of meaning representations and the performance of routine 
linguistically presented mathematical tasks. In order to provide a comprehensive 
and general theory it has been described in its most fully developed and elaborated 
form, without any regard to its gradual construction in the mind of the learner. 
Some acknowledgment of the necessity of treating developmental aspects of the 
theory has been made by discussions of the role of chunking and the associated use 
of partial SA trees, the growing list of transformations constructed by learners and 
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the use of visually moderated sequences. In order that the theory be psychologically 
plausible, these developmental aspects need to be supplemented with accounts of the 
relationship of the model to the acquisition of the language of mathematics, and to 
qualitative changes or stages in the development of understanding in mathematics. 


The study of the acquisition of language and of the stages of development of 
understanding are both vast fields of inquiry in their own right. For this reason, all 
that can be offered here is a brief and suggestive sketch. 


A significant part of the acquisition of language is the learning of the 
vocabulary used in mathematics. This includes numerals (such as one, two, three, 
ten, hundred, million, first, second, third, half), operators (such as halving, 
doubling, add-one), and relations (such as is, equals, less, more), to focus on 
vocabulary from the domain of number alone. According to Halliday (in UNESCO, 
1975) the acquisition of this vocabulary (and the underlying meanings) represents 
the beginning of the development of a mathematics register. 


It seems likely that the mathematical vocabulary, together with other 
vocabulary, is used to construct utterances, most commonly in the form of sentences 
built up from noun phrases and verb phrases, that is, following the rules of natural 
language grammar. The use of this form of syntax means that the grammar of 
mathematics, which is fundamentally that of the predicate calculus described above, 
is not in use in the early stages of the learning of mathematics. Number words and 
operations are initially understood (in their cardinal sense) in concrete terms, such as 
the regrouping of sets of sweets, and children resist attempts to decontextualise 
arithmetical problems (Hughes, 1986). The concrete usage of number is adjectival 
(for example three apples). Subsequently, the use of number words as nouns 
develops, and the underlying concepts can be symbolised by numerals. Although 
natural language depends on context for meaning (Donaldson, 1978; Barwise and 
Perry, 1982) children learn the context-free language of formal arithmetic with 
terms like 3 + 6, and incomplete sentences like 1 + 5 =. It may reasonably be 
assumed that expressions like these are understood more in terms of procedural 
meaning (requiring the performance of a computation) than in terms of context- 
building meaning (contributing to a larger story or context). 


The result of this process is that numerals come to represent the individual 
constants or entities required for the language of mathematics. In this way numerals 
are transformed from adjectival qualifiers of nouns in noun phrases to nouns, 
taking the central role in noun phrases themselves. 


The operations and functions of mathematics derive from natural language 
verbs such as add, take, halve, and so on, which contribute to verb phrases. It seems 
likely that the binary operations of mathematics are first understood as unary 
operations (Weaver, 1982; Vergnaud, 1982, 1983). Thus, for example, 2+5 is 
understood as ‘wo acted on by the operation add five. It seems likely that addition is 
not understood as a binary operation until commutativity is grasped, for until then 
the two operands are understood asymmetrically. When a+b and b+a are 
understood as interchangeable (in the last of the three stages of the learning of 
addition discussed above) then the operands acquire equal status, and the operator 
for the first time assumes the role of a binary operator or function. Thus the 
commutativity of addition marks the beginning of the acquisition of the grammar of 
formal mathematics, a grammar in which n-place functions (for n 7 1) act on n 
individuals to form terms. For n 2 2, as in the case of the binary addition operator, 
this contrasts with the grammar of natural language on which a single operator (a 
verb phrase, a verb or an adjective, for example) must act on a single argument (a 
noun phrase, a noun phrase or a noun, respectively, in the examples), to form the 
legitimate expressions of the language. An analogous further development occurs 
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when the equality relation (almost certainly the first formal binary relation to be 
learned) is understood as a binary relation which is symmetric with regard to its 
arguments (the two terms it relates). However, this understanding is not attained 
immediately, for as Kieran (1981), Adda (1982) and Hughes (1986) report, the 
equals sign is first understood assymetrically as a syntactic marker or do something 
signal (like the solidus or bar in a vertical addition task). 


My contention is that, as children learn the symbols and concepts of 
mathematics, they learn the grammar implicit in the use of n-place functions and 
relations (most notably for n=2). This is the grammar of mathematical language, 
which goes beyond the grammar of natural language, and extends it. 


During the years of schooling, children will develop concepts and symbols for 
the following mathematical notions, in something like the order in which they are 
listed: constants, functions, relations, variables, propositional connectives and 
quantifiers. These notions are built up partly from the corresponding notions of 
natural language, and partly from the previously learned mathematical notions. 
Although constants are probably the first category of mathematical expressions to 
be mastered (especially numerals), further constants continue to be acquired 
throughout the years of schooling (for example 4, 0:7, m, e, i). Likewise, an 
increasing number of functions (including operations) will be learned by children, 
and indeed the meanings of the functions learned will keep growing as they are 
applied to expanding domains (for example, the meaning of addition changes as it is 
applied to the domains of small whole numbers, large whole numbers, natural 
numbers, positive rational numbers, integers, rational numbers, linear polynomials, 
and so on). 


With the acquisition of constants and functions an increasing range of terms 
can be understood or constructed. The acquisition of the remaining categories (of 
basic mathematical symbols) similarly extends the range of expressions that can be 
understood and constructed, although for many learners understanding falls short 
of terms and sentences including variables, propositional connectives or quantifiers. 


A very important case in the learning of mathematical vocabulary and syntax is 
the acquisition of denary numerals and number concepts. There is agreement that 
the skill of counting, that is both knowing and being able to use the spoken sequence 
of numerals, is a central factor in the acquisition of the concept of number 
(Carpenter, 1980). From the pre-school years through the years of primary or 
elementary school education, the sequence of number names which can be spoken or 
recorded as numerals grows in length. 


This does not mean that the denary place value system which underpins the 
construction of numerals is necessarily understood. Large-scale testing reported in 
Hart (1981) and Assessment of Performance Unit (1985) suggests that most facets of 
whole number place value are understood by 60 per cent to 70 per cent of 11-year- 
olds. Thus, for example, 69 per cent of one of the samples tested were able to select a 
numeral (from several multi-digit numerals) with a digit 7 representing 7 tens. On 
the basis of this and similar evidence it seems likely that the procedure for 
constructing and writing numerals is acquired before the concept of place value is 
understood. 


One explanation for this is that numerals are understood as part of a sequence, 
which can be the temporal counting sequence or an internally represented number 
line (Ernest, forthcoming), in addition to, or prior to an understanding in terms of 
place value. Thus it is proposed that a denary numeral such as 132 will first be 
learned as a single composite symbol associated with a vocal utterance and related to 
other numbers or a number sequence. After the rules for the analysis of denary 
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numerals are learned (that is when place value concepts and notations have been 
mastered, according to the present theory) the numeral can be analysed as is shown 
in Figure 10. 


FIGURE 10 


FULL SA TREE (MEANING REPRESENTATION) 
FOR THE NUMERAL ''132" 


The diagram shows a full SA tree for the numeral ‘‘132’’. This involves the two 
one-place denomination functions or operators hundreds (H(x)=100x) and tens 
(T(x)=10x). These functions act as markers assigning denominations to the 
representations of numerals, single digit numerals in the illustration. The numeral 
representation for units clearly needs no such denomination function. Since the 
ability to analyse a numeral such as “132° (shown in Figure 10), almost certainly 
develops after the numeral is understood as a whole, chunking is unlikely to play any 
part, initially. It could be said that the representations of small (up to 3 digit) 
numerals come ‘‘pre-chunked’’, that is, they are learned as composites. Where 


chunking must play a role is in the comprehension of numerals with a large number 
of digits. Figure 11 shows SA tree representations for the numeral 1234567. 


The full SA tree is quite elaborate, and even if sufficient place value concepts 
are learned for its representation (notably the meaning of the denominations 
million, hundred-thousand, etc. and their application to numerals), as well as the SA 
analysis rules, the demands on short term memory are probably too great. This is 
also likely to be true of the second representation using the X rule. Readers of the 
numeral will not be able to represent all of it simultaneously unless a substantially 
chunked tree, such as that shown, is employed. 


The example of denary numerals serves to show that the order of acquisition of 
the rules for the representation of meaning may not be as smooth or as 
straightforward as suggested in the model initially presented. A number of factors 
can interact making the situation unpredictably complex. These are: the exposure to 
mathematical notation (such as is usual with numerals); the growth of the ability to 
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deal with complex notations, partly by the availability of more slots in short term 
memory and partly by their more efficient use (chunking); and the construction of 
derived SA rules on the basis of SA rules and rules of transformation. 


It has been suggested that during the learning of written mathematics the 
syntactical rules of natural language are supplemented by and partially replaced by 
rules reflecting the syntax of mathematics. In the realm of linguistic theory, the 
appropriateness of the syntax of mathematics is accepted by authors such as Bartsch 
(1973) and Montague (1974), although the conventional and opposite Chomskyan 
view is proposed by Hurford (1975) in his account of numerals. More recently, in 
Hurford (forthcoming), this view is extended beyond the realm of pure linguistic 
theory to the cognitive domain. What few authors (including those named above) 
have acknowledged is that there might be (or is, according to the model proposed in 
this paper) a development in representations from the syntax of natural language to 
the syntax of the language of mathematics. Hurford (forthcoming) analyses 3 + 5 = 8 
into a noun phrase and a verb phrase and leaves no room for a shift to occur in the 
way this is understood. Bartsch (1973) denies that the sentence there are two houses 
can be understood in terms of a predicate two applied to house. He insists that it is 
an equality statement number (house) =2 holding between the numeral 2 and the 
unary numerosity function number applied to house. In the view of the present 
author, both of these proposals are plausible if seen as part of a developmental 
sequence. Numbers are initially understood as adjectives (Hurford agrees with this), 
later becoming objects in their own right, requiring the more elaborate second form 
of Bartsch, since an object, constant or noun, which is what a number or numeral is, 
cannot be applied to a set. 


In terms of development, it is proposed that the grammar and language of 
mathematics available for the representation and understanding of expressions is 
acquired over an extended period of time. Furthermore, the ability to understand and 
to use the different grammatical categories of symbols may be taken to develop in a 
number of distinct stages. The stages in the development of understanding and use 
of mathematical language are shown in simple diagrammatic form in Figure 12. 


The diagram shows the categories of symbols of mathematical language 
representing different stages as a series of steps of ascending abstraction and 
formality. As each stage is reached, so too the facilities associated with previous 
stages are further developed. At the lowest level is the use of natural language. On 
this is built the use of mathematical constants. Subsequently, the use and 
understanding of mathematical functions and operations is acquired. The next stage 
involves the use of mathematical relations. Many learners never get beyond this, for 
the next stage can be regarded as falling within the Piagetian stage of formal 
operations, which many British pupils never attain (Shayer and Wylam, 1978). This 
is the stage of use of individual variables. The stages that follow become increasingly 
formal and difficult. They involve the use of formal logical connectives, the use of 
formal quantifiers, and finally the use of formal metalanguages. 


According to the model proposed here, the first usage of each of the categories 
of mathematical symbols forms a developmental sequence of stages. It is also the 
case, however, that the extent and depth of usage of the symbols within each 
category also develops. A sketch of the development that takes place within the 
categories of constant, operation and relation has already been given. 


Consider the usage of individual variables. On the basis of extensive empirical 
work Küchemann (reported in Hart, 1981) was able to construct a hierarchy of 
understanding in algebra and was able to distinguish between six distinct 
interpretations of individual variables by children. Furthermore, this research shows 
that children's understanding of algebra and variables progresses and develops 
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FIGURE 12 
STAGES IN THE DEVELOPMENT OF UNDERSTANDING AND USE OF MATHEMATICAL LANGUAGE 
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during the course of secondary schooling. Similar developments may be assumed to 
take place within the other categories of mathematical symbols shown in Figure 12. 


The mastery of all of the categories shown is likely to be restricted to a small 
group of individuals, including professional mathematicians and theoretical 
physical scientists. For this select group it is likely that mathematical symbolism will 
be seen as more than simply a means of specifying the initial state of mathematical 
tasks proposed above. Advanced formal thinkers such as these may be able to read 
mathematical proofs in some way analogous to readers of natural language text. 
Thus, it can be assumed that this special group are able to engage in the meaning- 
getting and constructing process attributed to expert readers by psycholinguistic 
theory (Baker and Brown, 1984; Scardamalia and Bereiter, 1984). 


DISCUSSION 
A model for the mental representation of the linguistic expressions of 
mathematics and for the use of these representations in the performance of routine 
tasks is presented above. In addition, a brief indication of how the model relates to 
cognitive development is given. The question arises as to how these proposals relate 
to current theories and observational data. 


The adoption of an hierarchical tree structure for mental representations is 
almost universal in linguistic theory (Chomsky, 1957, 1965), in the linguistic 
treatment of numerals (Bartsch, 1973; Hurford, 1975 and forthcoming) and in 
linguistic analyses of algebra (Mayer, 1982; Sleeman 1984). Mental representations 
are also posited to be of this form (Kintsch, 1978; Rumelhart and Norman, 1981; 
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Anderson, 1983; Resnick, 1983), including those used in mathematical problem 
solving (Greeno, 1980). Thus the adoption of this form of structure can be said to be 
consistent with most related work. Of course, trees are simply a powerful and 
convenient way to represent hierarchical structures, and mental representations may 
in fact be in some different but isomorphic form. Nevertheless, a crucial feature of 
representation is accounted for by the model. 


It has been proposed that mental representations are transformed in the 
performance of routine tasks. Transformations are posited both in linguistics 
(Chomsky, 1957, 1965) and in cognition (Chomsky, 1964, 1967; Beilin and Lust, 
1975; Neches and Hayes, 1978; Hope, 1985). In fact, Beilin and Lust propose 
transformations which include an identical-coniunct collapsing scheme. This 
transforms a sentence such as ‘‘give me the dolls which are boys and give me the 
dolls which are girls’’ into **give me the dolls which are boys and girls". When this 
transformation is displayed in tree form (Beilin and Lust, pp. 205-206), the analogy 
with the distributivity transformation in Figure 5 is evident. It may be conjectured 
that the acquisition of transformations of mathematical representations may be 
facilitated by the previous acquisition of linguistic transformations such as this. Be 
this as it may, the model accounts for the vital transformational component in 
cognitive processing. 

There is an analogy between the dual model presented, comprising both tree 
structures and transformations, and two aspects of language distinguished by 
Jakobson (1956, 1961) and taken up by theorists such as Herriot (1970) and 
Walkerdine (1982). Jakobson characterises language in terms of combination and 
selection. He refers to the process of combination, contextualisation and contiguity 
as metonymy, and those of selection, substitution and similarity as metaphor. This 
distinction mirrors that between expression trees comprising contexts made up of 
combined and contiguous basic symbols, and transformations, which are selected to 
substitute similar substructures in trees. Thus the model accommodates both of 
Jakobson's fundamental aspects of language. 


Jakobson reports that sufferers from aphasia may be deficient in either the 
combination (metanymic) or selection (metaphoric) aspects of language. According 
to the analogy proposed above, aphasics with a combination function deficiency 
may be expected to have difficulty in understanding and mentally representing 
complex symbols, such as multi-digit numerals. Difficulties of just this kind are 
frequently observed (Luria 1966a, 1966b; Farnham-Diggory, 1978) and often 
explicitly related to aphasia (Luria, 1969). In contrast, patients with an aphasic 
selection disorder may be expected to have difficulty in performing calculations, 
even when they understand numerals, because of the transformation and 
replacements involved. Disorders of this type are also observed frequently in 
aphasics (Luria, 1966a, 1969; Farnham-Diggory, 1978). Thus, the proposed model is 
consistent with Jakobson's dual view of language and with observations of aphasics 
with dyscalculia. A strength of the model, therefore, is that it proposes distinct 
psychological mechanisms underpinning the two aspects of language. 


There is little in the way of available observational evidence which can be used 
to further test the proposed model. The use of mental transformations of 
representations, especially during mental calculations (see Hope, 1985), can be 
confirmed. 


It has been observed that negative sentences such as ‘‘7 is not even” take longer 
to understand and verify (and presumably to represent mentally) than their positive 
form, such as ‘‘7 is even’’. These and similar findings (Wason, 1961; Miller, 1964) 
suggest a functional relationship between the syntactical complexity of an expression 
and the time taken to represent it mentally. This finding is evidently consistent with 
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the proposed model since the more complex an expression, the more stages an 
individual needs to go through in **unpacking" it, that is, to represent it mentally. 


The model presented for the extraction of meaning includes a monitoring 
component which checks and decides on the ‘‘manageability’’ of mathematical 
expressions. Such checks are also built into the transformation component of the 
model. These checks have an affective component, since their decisions relate to 
persistence and motivation. If for some reason, in some individuals, these 
monitoring functions have a very low threshold, the individuals can be expected to 
give up trying to construct mental representations of mathematical expressions (that 
is, trying to understand the expressions) very quickly. 


Such behaviour is known as formula or meaning blindness (Williams, 1972). 
The present model is able to provide a rudimentary account of the processes 
underlying this phenomenon. That is, the individual with meaning blindness has too 
low an inbuilt manageability threshold which cuts off any attempts to extract the 
meaning from formulas. A similar account can be given of number anxiety (Biggs, 
1962) as a lowering, and thus easy triggering of the difficulty monitoring threshold, 
in the use of transformations in calculations. 


Overall, the proposed model can be said to be consistent with a substantial 
portion of relevant current theory and observational data. It can also be said to be 
fruitful, as it provides explanations for some well-known observed phenomena. The 
model is not definitive, and is evidently simple. It treats only the representation and 
transformation of linguistic mathematical data. A major omission is that of image 
representation, be it visual, spatial, kinesthetic, auditory, rhythmic, or other. 
Ultimately, an account of the representations associated with the linguistic 
expressions of mathematics must also indicate the role of these further modes of 
representation. However, the present model is offered only as a first approximation 
of the mental processes involved. 


In addition, it can be remarked that the model suggests a number of novel 
classroom approaches. Errors arise both in arithmetic (Blando er al., 1986) and 
algebra (Booth, 1984) from students' failure to follow correct rules of precedence. 
These errors may be attributed to a failure by students to discern the hierarchical 
syntactical structure or orders of precedence within mathematical expressions. The 
model presented above may be used to exhibit this structure explicitly to students. 
They can be trained to analyse expressions into written tree forms, and use written 
forms of some of the transformation rules to perform mathematical tasks. An 
empirical investigation of whether such approaches do indeed enhance student 
understanding would be very interesting, not least because of the light it might shed 
on the proposed model. Of course this proposal raises the problem of how the model 
could be empirically tested. No full answer can be given here but a number of 
approaches are possible such as error analyses, time taken to complete tasks when 
structure is manipulated and variation of context to explore effects on 
transformation processes. 


Two final issues need to be addressed. First of all, it needs to be reiterated that 
the model assumes that knowledge representation attributed to children (and adults) 
are isomorphic (at the appropriate stage of development) with the structure of 
mathematical syntax. This is an assumption which needs empirical testing. Indeed it 
will be easier to falsify than to confirm, since performances consistent with any 
given theoretical description can of course be accounted for by alternative 
descriptions. 


Secondly, there is the question of the extent to which the model adequately 
covers mathematical understanding. The proposed model only deals with the 
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structural representation of mathematical expressions. It does not deal with the 
complexity of the representations and associations of individual symbols, which are 
presumably to be located in the knowledge structures of long-term memory. Nor 
does the model deal adequately with meaning contexts on a larger scale to which 
individual linguistic expressions are only partial contributors. However, what it does 
offer is a novel account of how the structure of compound mathematical expressions 
are represented cognitively, as well as postulating a sequence for the cognitive 
development of such representations. As such it treats one or more of the 
components of understanding in mathematics. 


Correspondence and requests for reprints should be addressed to Dr. Paul Ernest, School 
of Education, University of Exeter, St. Luke's, Exeter, EX1 2LU, England. 
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A METHOD OF ITEM-ANALYSIS AND ITEM-SELECTION 
FOR THE CONSTRUCTION OF CRITERION-REFERENCED 
TESTS 


By G. M. SEDDON 
(University of East Anglia) 


Summary. The optimum composition of criterion-referenced tests in terms of the 
statistical parameters of the items is determined theoretically. A practical procedure for 
pre-testing and selecting items to achieve this optimum composition is discussed. The 
paper then describes an empirical investigation to determine the usefulness of this 
procedure. The results showed that a test constructed by this procedure was more 
effective than 93 out of 100 tests constructed by selecting items at random from a 
domain. 


INTRODUCTION 

Ім constructing public examinations, there has been an increasing interest in the use 
of criterion-referenced, as opposed to norm-referenced, tests. While both types of 
test are intended to assess an examinee’s performance on a particular universe or 
domain of test items. criterion-referenced tests set out to determine the proportion 
of the items in the domain which each examinee can answer correctly, i.e., the true 
score. In contrast, norm-referenced tests merely seek to produce scores which 
correlate as closely as possible with the true scores. 


In constructing criterion-referenced tests, several authors have argued quite 
rightly that items should not be selected by the usual methods used to construct 
norm-referenced tests (Millman and Popham, 1974; Hambleton et al., 1978; 
Hambleton and De Gruijter, 1983). The only alternative method which seems to 
have been suggested completely eschews the statistical approach to item selection 
(Harris et al., 1977; Popham, 1978; Hambleton, 1982). 


“. . . our position is that a random selection of items from the total domain rather 
than one based on item analysis should be used to construct an achievement test" 
(Harris et al., 1977, p. 3). 


The pupa of this paper is to determine a method of item-analysis and selection 
whic Ше more accurate scores than those produced by random sampling. The 
method is determined by a theoretical analysis of the problem, and its efficacy 
demonstrated by an empirical investigation. 


THEORY 


The discussion will refer to a domain of K items, from which it is intended to 
construct a test by selecting a sample of k items. The proportions of the items which 
the examinee can answer correctly in the domain and sample respectively, are given 
by T and X. No matter what kind of sampling is used, the value fX then differs from 

due to the effects of sampling error. Hence, the aim is to produce a value of X for 
each examinee such that X is as close as possible to the corresponding value of T. 
Over all the N examinees in the population, the extent to which this aim is achieved 
can be measured by ф, where 


N 
1 
d= È (K.-T) (1) 
1m] 
The smaller is the value of $, the better is the test. ° 


37i 
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On expanding the right-hand side of (1) and collecting terms, it is possible to 
relate ф to the mean and standard deviation of X and T, as well as the correlation 
coefficient p between X and T. 


ф = (X-T? + (ox – ax) + 2axox(1— p) Q) 
The first step in the analysis is to determine the combination of values for X, p and ox 


which minimises $. The second step concerns the characteristics of the items which 
give these optimum values. 


In determining the conditions which minimise ф, it must be borne in mind that 
many, but not all, practical methods of varying one of X, оҳ and p will bring about 
concomitant changes in the other two. However, it simplifies the analysis if the 
procedures considered here change just one of these three variables at a time. Thus it 
Is obvious from (2) that @ decreases as X approaches T, and as p approaches 1. The 
optimum value of ox is found from elementary calculus by differentiating (2) with 
respect to ax, setting the differential equal to zero, and solving for ox. Then 


Ox = стр (3) 
When tests are constructed by the random selection of items, the sampling 
errors (E) will produce values of X which can be both greater or smaller than T, and 


the values of р will in general be less than the optimum. In the case of ox, it isa 
standard result of classical test theory that 


gi = сё. + ot. 


Непсе, in contrast to the optimum value of ox, the average value of ox for random 
item samples will exceed от. 


Ultimately, the values of X, p and оҳ are determined by the values of the 
ол of the individual items, i.e., mean (zr), standard deviation (о), and point 
iserial correlation coefficient (8), as calculated with reference to the complete 
domain, Therefore, the composition of the optimum item sample can be determined 
by considering how X, p and сх vary on changing the values of o, B апат. 


The value of X is simply the average of the values of for all the items. Hence in 
the optimum criterion-referenced test 
k 


+ x mat (4) 


The relationship between p and the item parameters is a well-established part of 
classical test theory, and, as in the procedure for constructing norm-referenced tests, 
the value of p is optimised by selecting items with large values of 8. However, the 
effect of this procedure is also to increase oy, as described by the standard relation- 


ship 
k 
1 
Ox => У og, (5) 


j=l 
Hence in constructing criterion-referenced tests, this concomitant increase in oy has 
to be off-set by selecting only those items which also have low values of о, so as to 
achieve the criterion condition implied by (3), i.e., 


1 k 
x Esse (6) 


The overall results of this analysis are conveniently summarised with reference 
to Figure 1, in which values of В are measured on the vertical axis, and values of v, 
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FIGURE 1 


SCHEMATIC ILLUSTRATION OF How a, fj AND п May BE CHANGED 
TO OPTIMISE p AND от 


038, = fni (i-n)8, = бү 
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together with their corresponding values of a, are represented on the horizontal axis. 
The curve joins all points corresponding to items having combinations of a and 8 
such that «8 = o4. Thus the optimum values of $ are achieved by choosing items 
from the top left- and right-hand corners of Figure 1. While there are still many ways 
of selecting such items, one obvious method of meeting the required conditions is to 
choose items from only the two shaded areas, with the relative numbers from each 
area determined so as to produce the required value of X. 


PRACTICAL PROCEDURE 
The practical application of these results must take account of the fact that in the 
vast majority of circumstances the pre-testing procedure will be performed on 
samples, rather than populations, of both examinees and items. As a consequence, it 
is not рае to determine directly the values of a, 8, т, T or от. However, it is 
possible to make estimates of these parameters from the item-statistics a, b and p 
which correspond to с, 8 and respectively. 


While all three of these item statistics are subject to the effects of sampling 
examinees, for random samples the values of p are always unbiased estimates of the 
values of zr, and, provided there are more than 20 students in an examinee-sample, 
the values of a and b are very close approximations to the unbiased estimates of a and 
B respectively (Olkin and Pratt, 1958; Cureton, 1968). The effects of item-sampling 
impinge only on the values of b, because for any one item this statistic is determined 
by the number and nature of the other items in the test. In fact, the values of b will 
generally be somewhat greater than those of the corresponding values of 8, because 
the effects of an item correlating with itself will be greater in a sample than in the 
whole domain. However, for random samples, it is reasonable to assume that those 
items with large values of 8 will generally produce large values of b. Therefore, the 
overall conclusions which apply in theory to the values of the three item parameters 
can be applied in practice to the item statistics. T 
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These item statistics can also be used to estimate T and от. Thus the average 
value of all the values of p will be an unbiased estimate of T Since the values of B will 
in general be less than the corresponding values of b, 


K K 
1 1 
от= < > ой <= Dab 7 
Dy È %8 < ар 0) 
and an upper bound for ст can be calculated as the average value of ab for all the 
items. Hence, the sample of k items chosen for the optimum test must meet the 
condition which follows from (6). 
k K 

1 1 

— ġab < = ХУ ар 8 

E. 99 Eo o (8) 
or, in words, the average value of ab in the sample must be less than the average value 
of ab in the whole domain. 


EMPIRICAL INVESTIGATION 


The overall purpose was to determine whether the conclusions derived from the 
previous analysis actually do apply to a real situation. First, it was intended to 
investigate the validity of the conclusions concerning the optimum combination of 
values for a, B and т. Secondly, it was intended to determine whether the combined 
effects of item- and examinee-sampling in a pre-testing procedure will still allow 
items to be selected with sufficient accuracy to produce a test which is significantly 
better than those constructed by selecting items at random from the whole domain. 


Method 

The basic plan summarised in Figure 2 required a domain of items and a large 
group of examinees. After dividing the examinees at random into two groups, one 
was to be considered as an examinee-population or validation group. The other was 
to be sub-divided further into examinee-samples. The examinee-population was to 
work through all the items in the domain. The results were then to be used in 
calculating the values of a, В and т for every item. The examinee-samples were to be 
involved in a pre-testing procedure where a different randomly selected item-sample 
was administered to each examinee-sample. The results were to be used in obtaining 
the values of a, b and p for every item in the domain. Both the item-parameters and 
the item-statistics were then to be used in compiling tests designed to investigate the 
various points of interest. In all cases, the values of p, ox, X and ф were calculated 
using the values of both T and X determined from item-scores obtained on admini- 
stering the whole item-domain to the whole examinee-population. 


The validity of the conclusions from the theoretical analysis were to be investi- 
gated in two ways. First the item parameters were to be used in constructing the 
optimum test (Test олт), and its value of ф compared with those of 100 item-samples 
drawn at random from the whole domain. Secondly, this value of ф was to be 
compared with those of item-samples obtained by successively replacing the items in 
Test ag in such a way that particular item parameters change. In this way, it is 
possible to determine whether the values of each item parameter in Test afi were 
optimum or not. 


The effects of item-sampling were to be investigated by compiling the predicted 
optimum test on the basis of the item-statistics, a, b. p. The value of ф for this test 
(Test abp) was then to be compared with those from the 100 randomly selected 
item-samples. 


G. M. SEDDON 375 


FIGURE 2 
FLOWCHART TO ILLUSTRATE THE SAMPLING PLANS FOR THE EXPERIMENTAL DESIGN 





The examinees. Since the theory makes no assumptions about the educational 
level or range of abilities of the examinees, the examinees can be taken from any 
group of students. In the present investigation, the major concern was to recruit a 
sufficiently large number of examinees, and in achieving this aim, no attempt was 
made to confine the examinees to a particular educational level. In fact, a total ро 
of 493 examinees was taken from 11 secondary schools and one university. All the 
examinees were following chemistry courses in one of four different year groups or 
educational levels. The ages of the examinees ranged from 15 to 19 years. 


The item-domain. Since the nature of the item-domain does not bear upon the 
validity of the statistical aspects of the overall theory, virtually any domain of items 
could be used. In the present experiment, the chosen domain comprised 200 
dichotomously scored constructed-response items, each asking a different factual 
question about the chemical elements. 


The item- and examinee-sampling procedures. The size of the item-samples was 
set at 25 items. The number of examinees used in each item-examinee sample, was 
set at a nominal value of 30 in order to give unbiased estimates of a and B from the 
values of a and b respectively. 


Administration. The whole Biden) O was to involve a total of 240 
examinees in the eight examinee-samples. This left 253 for the validatidn group. 
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Since all the examinees were taken from different institutions, the examinees in 
each examinee-sample and in the validation group were distributed between the 
institutions. After allocatin the items at random to the item-samples, the different 
item-samples and items for the whole domain were each printed on A4-sized pages in 
their own respective booklets. Then, the required number of booklets for each 
item-sample and the whole domain were distributed in a completely random fashion 
throughout all the examinees, irrespective of the institution to which they belonged. 
All examinees took the test in the same period of one month. 


In administering the test, the examinees for the validation group and examinee- 
samples wrote their answers in spaces provided in the question booklet. There was 
no time limit. In fact, those examinees in the validation group completed the whole 
domain in times varying from 30 to 60 minutes. In contrast, examinees in the 
examinee-samples completed their item-samples in less than ten minutes. In order 
not to disturb those who were still working, these examinees were also given a further 
set of chemistry questions which were not considered in the final analysis. 


The experimental tests. All the experimental tests comprised 25 items. In con- 
structing Tests ofr and арр, it is not necessary that every item should meet the 
condition 28 <от or ab « ab. It is necessary only for the average values of these 
products to be less than от and ab respectively. Thus, in the samples actually 
chosen, about half had values of аВ or ab greater than от or ab, respectively. 


Each of the methods of changing the items in Test ofr was carried out in 
successive stages until as many items as possible had been replaced. In one method 
there were changes in the values of т only, and each stage replaced one item by 
another with a lower value of 7, but with values of a and В constant to within +0-01. 
In corresponding fashion, a second method concerned changes only in B. A third 
method emphasised changes in a, but could not be achieved without concomitant 
changes in 7. In practice, the procedure increased the values of a, while keeping 
constant the average value of 7 (i.e., X), to within + 0-001. This aim was achieved by 
always Pe two items at a time, one from each side of the point corresponding 
to т —0:5. 


Results 
The total number of scripts returned from examinees allocated to the validation 
group was 219. The whole domain had a mean of 0-59 and a standard deviation of 
:16. The values of a, B and т for all 200 items are summarised in Figure 3, where 
those items chosen for Test og are plotted as open circles. 


The resulting characteristics of the test together with those of the complete 
domain and other item-samples are summarised tn Table 1. All of the 100 randomly 
selected item-samples had values of ¢ greater than that of this final test. 


The results of replacing the items in this test with items having different values of 
а and в are summarised in Figure 4. It is clear that, as the values of a increase, and as 
the values of 8 and лт decrease, there is an overall general increase in the value of ф. 


All the items selected from the results on the item-examinee samples had values 
of p either in the range 0-01-0-18, or the range 0-82-0-99. The items with the higher 
difficulty indices had values of b in the range 0-41 to 0-67; those with the lower 
difficulty indices had values of b in the range 0-11 to 0-58. There were no randomly 
selected item-samples with values of $ less than Test abp. However, six random 
samples had values of $ equal to the value of ф for the experimental test. 
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FIGURE 3 
SCATTERGRAM OF В AGAINST а AND 1 FOR ALL THE ITEMS IN THE DOMAIN 


«B OT «б=т 


о = item used In Test КВ 





TABLE 1 
STATISTICAL PROPERTIES OF THE VARIOUS TESTS (k = 25) 








Item sample X от р ф 
Random 0-59* 0-18* 0-92* 0-008* 
Test afin 0-59 0:16 0-94 0-003 
Test abp 0°59 0-15 0-92 0-004 





* Mean values for 100 samples 
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FIGURE 4 


GRAPHS TO SHOW How $ CHANGES ON REPLACING THE ITEMS IN THE OPTIMUM TEST 


015 | Decreasing П 






010 Increasing © 


Decreasing B 


0 5 10 15 20 25 
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Discussion 

Table 1 shows that for Test afr the measures taken to produce a mean equal to 
the domain mean have been remarkably successful. The two means are identical to 
two decimal places. The value of p exceeded the values of p obtained for all 100 
randomlv selected tests. Despite the attempts to produce a value of a which is less 
than c, the obtained value of c, is equal to oy to two decimal places. 


When the overall effectiveness of Test абл is considered, its value of $ is seen to 
be better than those of 100 randomly selected item-samples. A different perspective 
for the magnitude of these effects is obtained by considering the values of the 
standard error of measurement, og, when X and T are expressed as percentages 
rather than proportions, i.e., multiplying the square root of $ by 100. For the 
randomly selected item samples, the mean of o is 8-9 per cent; for Test af, 
Gg = 5:5 per cent, Thus the standard error of measurement is practically halved by 
selecting the items according to the procedure proposed here. 


Figure 4 shows that, as predicted, there is an overall increase in the values of ¢ as 
items are replaced by others having different values of either a, B, or т. The curve 
representing changes in В does show the occasional decrease rather than a con- 
tinuous increase, and thereby reflects some limitation in the validity of the theory 
relating to the effects on ax. However, it is clear that, despite these minor dis- 
crepancies, the overall procedure for the optimisation of ¢ is essentially successful. 


The shape of the B-curve also shows that the effects of changing the values of 8 
are much staller for the first few item replacements than for those made later. 
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PO алар, the mathematical theory does not allow such detail to be predicted, 
and it is not clear whether such a trend will always occur. 


In еар the effects of е sampling errors during the pre-testing procedure, 
Table 1 shows that Test арр has a mean equal to the domain mean, and a value of cx 
which is actually equal to the optimum value indicated by (4), і.е., рот = 0:92 х 
0-16 = 0-15. Its value of p is equal only to the mean of those obtained for random 
sampling, and clearly the effects of sampling the examinees and items are having an 
effect here. Nevertheless, the obtained value of p is still greater than the majority of 
the values of p obtained from random sampling. The resulting value of $ for Test abp 
is not now better than all 100 randomly selected tests. However, it is better than the 
vast majority, and only a few random tests are as good. 


CONCLUSIONS 


Both the theoretical analysis and the empirical investigation indicate that the 
random selection of items is not the most effective way of constructing tests aimed at 
determining the domain score of each examinee. Moreover, the empirical investi- 
gation supports the overall predictions of the theoretical analysis as regards the 
optimum method of item-selection. 


The empirical investigation also shows that the opposing effects of item- 
examinee sampling in the pre-testing procedure need not be overwhelming. It 
remains for further investigations to Тае the lowest limits for the item- 
examinee samples. 


Correspondence and requests for reprints should be addressed to Dr. G. M. Seddon, 
Chemical Education Sector, School of Chemical Sciences, University of East Anglia, 
Norwich, МВА 7TJ. 
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A DISCRIMINATION INDEX FOR CRITERION-REFERENCED 
TEST ITEMS 


By THOMAS R. BLACK 
(Department of Educational Studies, University of Surrey) 


Summary. A new index for criterion referenced test items is proposed that takes into 
account disparity in size of above and below criterion groups (i.e., the score distribution 
shape). Derivations are presented based upon possible discriminations as well as a 
probability model. This index is contrasted with Brennan's index which ignores group 
sizes. A comparison of resulting values for the indices for a set of items showed a high 
d of disagreement in interpretation, the new index apparently being the better 
indicator. 


INTRODUCTION 


Witu the advent of the microcomputer, it is now relatively easy for teachers to carry 
out item analyses on their own classroom tests, which would include discriminating 
power. This is routinely done by examination boards on trials when preparing 
subject examinations (Nuttall and Willmott, 1972). In addition, the opportunity 
arises to check the discriminating power of questions that are used in routeing and in 
post-tests for computer assisted learning courseware. In all three examples, the tests 
may be criterion referenced rather than norm referenced. 


The index most easily employed for norm referenced tests, which assumes a 
normal distribution of scores, is directly proportional to the difference between the 
number of correct and incorrect discriminations made by an item. Usually, by using 
the upper 27 per cent and lower 27 per cent performance groups, a value for an item 
is calculated from the formula, 


р= Ч=- (1) 
М 
where U = number in upper 27 per cent getting the item correct 
L = number in lower 27 per cent getting the item correct 
N = 27 per cent of all students taking the test 


The possible values of D range from -- 1:00 (indicating a perfect discriminating 
item) to 0-00 (a non-discriminating item) to — 1:00 (indicating equally perfect 
discrimination in which all those in the upper group failed and all in the lower group 
answered correctly). Values of D are not independent of item facility, but are biased 
in favour of items having intermediate facility levels. The rationale for this approach 
has been widely discussed (Johnson, 1951; Findley, 1956; Englehart, 1965) and the 
index, D, is frequently recommended in test and measurement text books (Mehrens 
and Lehrman, 1978; Anastasi, 1976; Gronlund, 1981). 


The application of this index to the results of criterion referenced tests often 
produces negatively skewed distributions and has not been favoured (Popham and 
Husek, 1969), though the principle of applying a more relevant index of dis- 
criminating power is accepted. There do exist a number of ways of transforming a 
set of scores to achieve a normal distribution (Ferguson, 1976), which could be 
carried out prior to performing the above calculations, but such a procedure 
assumes that underlying the original non-normal data distribution is a trait or 
characteristic that should generate a normal one (Guilford and Fruchter, 1973). For 
example, it could be argued that if a truly random sample of people took a test, 
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regardless of whether or not they underwent a corresponding learning experience, 
formal or otherwise, then a normal distribution of scores would result. Thus, 
normalising the scores in the skewed distribution is justified. But it could also be 
argued that such a situation is not typical of the use of criterion referenced tests. The 
prime concern of users of such a testing procedure tends to be to determine achieve- 
ment of specific knowledge and skills, which belies the assumption of an underlying 
normal distribution of some trait. Though criterion referenced testing is used in a 
variety of contexts, the shape of the distribution of the scores is usually assumed to 
be the result of the definition of the criterion and the quality of the instruction, if 
any. The author's feeling is that the second argument is the stronger and therefore, 
an index that is free of assumptions for any underlying distribution shape is needed. 


Brennan (1972) proposed an index, B, which was claimed to be suitable for use 
with criterion referenced test results. But when this author calculated B for 
numerous items from a series of criterion referenced modular post-tests, several 
problems arose, which this new index is intended to overcome. 


BRENNAN'S INDEX 
Brennan's (1972) index is calculated as folows. 


U L 
B= = – = 2 
N UN (2) 
whereU = number in upper group getting the item correct 
L = number in lower group getting the item correct 
N, = number in upper group 
N, = number in lower group 


It is conceptually described much the same way as the traditional norm referenced 
index, D. B is the difference in percentages of upper and lower groups who get the 
item correct. Obviously, when N, — N, — 27 per cent of all students, then B — D. 
The values of B also range from = 1-00 to — 1:00, but are interpreted in a rather 
different way than for D, as will be described below. 


It is assumed that Brennan uses all students above a single criterion point for N, 
and below it for N,. As Berk (1978) has noted, problems arise in obtaining a finite 
value for B if N, or (more likely) N, were to equal zero. A situation could arise 
involving a highly skewed distribution where, for example, N, — 30, N, — 6 (i.e., 
out of 36 students, 30 achieved mastery on the test). If for a given item, U = 25 and 
L = 6, indicating a high success rate by both groups, then B is close to zero. The six 
students have a greater influence on the index than the 25 and the item is labelled as 
an adequately discriminating item, even with B close to zero, when considered in 
light of the facility index of 0-86. (Brennan's unorthodox decision criteria are 
discussed in detail later.) Though it could be argued that the numbers are repre- 
sentative of larger groups and therefore this is a valid judgment; intuitively it is felt 
that this should not be classified as a good discriminating item. 


The author has used modular criterion referenced tests for an introductory 
in-service course in statistics for teachers. The tests were computer marked, and 
individualised feedback was generated through a computer managed learning 
package. In such a system, the test designer is not restricted to a single cutting-off 
point, but can easily use two cut-off scores, thus dividing student performance into 
three levels: mastery, doubtful, and non-mastery. This strategy takes into account 
testing error which would otherwise influence the accuracy of a binary decision of 
mastery/non-mastery. When calculating a discrimination index, this is analogous to 
using the conventional upper/lower 27 per cent split, which ignoreg the middle 
group. 
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When Brennan's index was applied to items of tests that produced negatively 
skewed results, the non-mastery group (N,) was small and occasionally zero. It was 
felt that the disproportionate influence of the small lower group could be neutralised 
by weighing the influence of each group according to its size. Other situations surely 
exist where large differences in group sizes arise from the use of criterion referenced 
tests. 


A NEW INDEX 

The resulting index is described as the difference between the proportion of 
students answering as expected (i.e., those in the upper group answering it correctly 
and those in the lower group answering it wrongly) and the proportion answering 
contrary to expectation (i.e., those in the upper group answering it wrongly and 
those in the lower group answering it correctly): 
U+N -D | L-*(N -U) 

N, + №, N + №, 

where, in addition to previously defined variables, 


М, — L = those in the lower group answering the item wrongly 
№ — U = those in the upper group answering the item wrongly 


By placing all over a common denominator and simplifying, this can be reduced to 
м, + №, 

It is important to note that if N, = N,, the formula reduces to that of D, formula 

(J), as does B, formula (2). 


It is also possible to think of E as an expression of difference in probabilities of 
correct and incorrect classification. 


E = p(correct) — p(incorrect). 


A two by two contingency table as shown in Figure 1 for a representative trial group 
would provide data for a value of E, which conceptually is described as 


E- (3) 


correct: upper + incorrect : lower incorrect: upper + correct: lower 


total total 


Substituting the values from the numerical example above into equation (4), it 
is found that E = 0:39. Considering that out of 36 students, 25 performed as 
expected and 11 performed contrary to expectation, this value for E seems more 
indicative of the quality of the item (1.е., it needs inspection and possibly rewriting), 
than that of B which indicates the item is acceptable. 


FIGURE 1 
CONTINGENCY TABLE FOR GROUPS VERSUS CLASSIFICATION 


CLASSIFICATION 
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This new index can be arrived at from a different approach. Findley (1956) 
showed that D could be obtained by considering all the possible discriminations, and 
Brennan (1972) showed by the same method how B could be obtained. Index E can 
be found by using the same method, but with an important difference. 


First, Findley's and Brennan's arguments will be examined. If there were N 
persons in the upper 27 per cent and N in the lower 27 per cent, then there would be 
N x N possible discriminations. For Brennan’s index, if there were N, in the upper 
group and N, in the Jower group, there would be N, X N, possible discriminations. 
If considered from Brennan's viewpoint, since Findley's is the special case of 
N, = №, = 0:27N, then the proportion of correct discriminations can be calculated. 
Using the same notation as above with some additions: 


U(N,- 1) = number of correct discriminations 
І (№, -U) = number of incorrect discriminations 
then, 
В = UW, - L) 2 L(N, — U) (5) 
N,N, N,N, 


which reduces to Brennan’s original equation (2). But there is an assumption that a 
third type of discrimination can be ignored. First note that 

UN, - LD +L, - О =/= NN, 
Two non-discriminations cover all the other possibilities. These are 


(NL - LN, - U) | non-discriminations 
2 1 


These discriminate neither correctly nor incorrectly, but can be used in a weighting 
factor to be added to Brennan's index to take into account disproportionate upper 
and lower group sizes: 


UN,- L) L (N, ~ U) (ow) (кт DOT 


NN — NN  WNW-N/NN © N,N, 


The weighting factor may best be described as: a proportional difference in group 
sizes times the difference in proportions of the two non-discriminations. Multiplying 
the first term by (N, + N,)/(N, + N,), which is equal to one and consequently does 
not change the value, the equation reduces to Equation 4. If N, — N,, this again 
reduces to D, Equation 1. 


To best illustrate the differences between the indices B and E, it is necessary to 
express both as functions of only two variables. Those chosen were the ratios of the 
number of students getting the item correct to the total in each group: P = U/N, 
and Q = L/N,. Thus Equation 2 becomes, 


B(P,Q) = Р- Q (7) 
and Equation 4 becomes, 
/N) QP - 1) - (20 – 
Epp, = NYND @P = D- 020 - 0) 
N,/N, + 1 


While B(P,Q) is independent of the ratio of group sizes, E(P,Q) is not. Figure 2 
BIOS B(P,Q) = E(P,Q) for М, = N,, and E(P,Q) for N,/N, = 2-0, 5:0, 0:5 and 


(8) 


From Figure 2 it can be seen that all functions are planes in a box, and that all 
functions have the points (1,0,1) and (0,1, — 1) in common, i.e., both'indices have 
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FIGURE 2 
THREE- DIMENSIONAL GRAPHS OF INDICES VERSUS P AND Q 


(a) Brennan's B for All N,/N, (Also E When N,/N, = 1:0); 
(b) E When N,/N, = 2:0; (с) E When N/N, = 5:0; (d) E When N,/N, = 0:5; 
(e) E When N,/N, = 0-2. 
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possible ranges of —1-0 to +1:0. The difference caused by the influence of the 
ratio of N,/N, is a rotation of the plane around a diagonal through the two fixed 
points. This simply illustrates how E is influenced by the relative sizes in the upper 
and lower groups when B is not. 


THE INFLUENCE OF ITEM FACILITY 
A question which can be asked about any discrimination is, does it favour items 
of certain levels of facility, and if so, which? Facility is defined as the percentage of 
all the students answering an item correctly. An estimate of this can be calculated by 


considering upper and lower groups alone (if there is a third middle group) as, 
ye uL 
N, + М, 


Solving for U, this becomes, 
О = -L + FN, + FN, 


Recalling the definitions of P and Q earlier, P = U/N, and Q = L/N,, and 
substituting into the above equation, a relationship between P and Q can be found, 


-Q 1 
P = Fi 1 9 
(© N,/N ч | $ x T 


1 2 

Equation 9 presents P as a function of Q, which if plotted in two dimensions, 
would be a family of straight lines of the form Р = mQ + a, with gradient, 
т = 21 , and P-axis intercept, а = F/1 + 

МУМ, N,/N, 

If this were plotted in three dimensions with B or E as the third variable, it 
would appear as a vertical plane intersecting the plane described by B or E. Figure 3 
shows the plane for N,/N, = 2:0 and Е = 0-4. The orientation of the plane 
changes, as indicated by the angle theta, 0, and increases as N,/N, increases. 


From such a graph, it is possible to determine the range of values of B, which 
can be ascertained from the line of intersection of the two planes. In Figure 2, B can 














FIGURE 3 


INTERSECTION OF PLANE OF EQUATION OF B(P,Q) WITH PLANE OF P (Q). 
WHERE N,/N, = 2:0 AND F = 0:4. 
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TABLE 1 


RANGES OF POSSIBLE VALUES OF DISCRIMINATION INDICES B AND E DEPENDING UPON 
FACILITY ESTIMATES AND UPPER/LOWER GROUP SIZES 

















B E 
N,/N, F 
min max min max 
1:0 1-0 0 0 0 0 
0-8 —0-40 0-40 —0-40 0-40 
0-6 —0-80 0-80 —0-80 0-80 
0-5 -1:00 1-00 — 1-00 1-00 
0-4 —0-80 0-80 0°80 0-80 
0:2 —0-40 0-40 —0-40 0:40 
0 0 0 0 0 
2:0 1:0 0 0 0-33 0:33 
0-8 —0-30 0-60 —0-07 0:72 
0:67 —0-50 1-00 -0:33 1-00 
0-6 —0-60 0-90 — 0-47 0-80 
0:4 —0-90 0-60 —0-88 0:47 
0-33 -1:00 0-50 —1:00 0-33 
0:2 — 0-60 0:30 -0:73 0-07 
0 0 0 -0:33 —0-33 
5:0 1-0 0 0 0:67 0-67 
0:83 —0-20 1-00 0:33 1-00 
0-8 —0-24 0-96 0-27 0:93 
0-6 0:48 0-72 —0-13 0:53 
0-4 -0-72 0-48 —0-52 0-13 
0:2 —0-96 0-24 —0:93 —0:27 
0:17 -1:00 0-20 -1:00 —0:33 
0 0 0 -0-67 —0:67 


range from — 0:90 to -- 0:60, depending on values of P and О. To determine the 
range of values of E for F — 0-40, it would be necessary to look at the intersection 
of the P(Q) plane with the E(P,Q) plane illustrated in Figure 2(b), which gives a 
range of — 0:88 to -- 0-47 for E. Using this approach, ranges of values for B and E 
for a variety of ratios of N,/N, and values of F are given in Table 1. The exemplar 
values of F for N,/N, = 2-0 and N,/N, = 5:0 include some numbers at apparently 
anomalous intervals. These were chosen to show for what values of F the values of 
В and E became + 1:00 and — 1-00, since these no longer occur at F = 0:5 for 
skewed distributions. 


It is apparent that both B and E respond to the facility of an item much the 
same way in terms of potential ranges of values. Again the difference is attributable 
to the ratio N,/N,: The greater it is the greater the difference in ranges of B and E. It 
should be noted that with empirical data, for which the facility calculation is based 
upon all persons answering the item instead of just those in the upper and lower 
groups, the ranges may well exceed those values shown in Table 1. 


JUDGMENTS BASED UPON DISCRIMINATION INDICES 


Brennan (1972) uses both B and F when considering the quality of an item, 
noting that while negatively discriminating items are undesirable, non- 
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discriminating (B = 0) items are acceptable when F is high, since most students are 
answering the item correctly. Non-discriminating items with low facility value need 
investigating, however, simply on the basis of overall poor performance. A highly 
positive value for B is not acceptable since this indicates some problems with the 
item. While there is no index which can consistently identify whether the source of 
the problem lies in the item itself or the instruction preceding it, Brennan suggests 
the probable sources as shown in Table 2. A corresponding pattern for the new 
шо, E, is also given, but the interpretation is closer to that of the traditional 
index, D. 


TABLE 2 
DECISION TABLES FOR THE USE OF DISCRIMINATION INDICES B AND E 


























Index B 
Facility F High Near Zero Negative 
High item? acceptable item 
and/or 
Low instruction? item? instruction? 
index E 
Facility F High Low or Negative 
High acceptable item? 
item and/or 
Low instruction? instruction? 


It can be argued that for the classroom situation, an item with high facility and 
high value for E (i.e., when most students learn and a greater proportion of all 
students perform as expected than do not) is the most desirable. Items with high 
discrimination but low facility could indicate a fault with the learning situation, 
since overall performance is low even though the item discriminated as planned. 
Low or negative discrimination values with high facility could indicate a definite 
fault with the item, since the lower group (below criterion) tends to answer it 
correctly while the upper group (above criterion) tends to answer it wrongly. If E is 
low or negative and F is low, then the source of the fault could lie either with the 
item or the learning. 


EMPIRICAL RESULTS 


The indices B, E, and F were calculated for 179 items from two pre-instruction 
diagnostic tests and seven post-tests produced for an introductory statistics course 
for in-service teachers at the University of Ulster. Applying the criteria of Table 2, 
the level of agreement between indices was ascertained, resulting in much disagree- 
ment as can be seen from Table 3. A total of 51 per cent of the 106 items for which 
both the indices applied received conflicting classifications. Upon closer inspection 
of the raw data from these items, it was felt that E was the more valid and accurate 
indicator of discriminatory power. A further 73 items could not be considered in the 
same manner since either the upper or lower groups were empty and, therefore, B 
could not be calculated. 


The results in Table 3 are not offered as data representative of any larger group, 
but form the basis for an initial evaluation. They indicate a tendency for the inter- 
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TABLE 3 


A COMPARISON OF DECISIONS MADE ON 176 TEST ITEMS FROM POST TESTS BASED UPON THE DISCRIMINATION 
INDICES B AND E (PERCENTAGES FOR ITEMS CLASSIFIABLE BY BOTH ÍNDICES) 


Decisions based on E 























Acceptable Unacceptable 

(E > 0:70 and (E < 0:70 and/or 
Decisions based on B F > 0:60) F < 0°60) 
Acceptable 
(—0-40 < B < 0-40 И 27 
and Е > 0-60) (10%) (25%) 
Unacceptable 
(-0:40 > В > 0:40 28 40 
and/or F « 0-60) (26%) (38%) 
Undecided 
(B undefined, 66 7 
N, or N, = 0) 


pretation of the two indices to conflict frequently enough that the two cannot be 
considered interchangeable. Also, this emphasises a limitation of application of the 
B index for classroom teachers trying to validate their tests, particularly if they are 
successful at facilitating learning. Obviously, resolving the issue of validity of either 
index will involve their application to additional empirical data. 


Correspondence and requests for reprints should be addressed to Dr. T. R. Black, 
Department of Educational Studies, University of Surrey, C.A.L. Group, Guildford, Surrey, 
GU2 5XH. 
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Summary. This paper reports a two-part study of motor impairment in boys with 
moderate learning difficulties. In the first part, the performance of a group of 
intellectually retarded boys from a special school was compared to that of a group of boys 
of the same age from ordinary schools on the Test of Motor Impairment. Significant 
differences between the groups were obtained, indicating that the boys with learning 
difficulties were less well co-ordinated. The second part of the paper reports the 
relationship between the retarded boys scores on the TOMI and their scores on two other 
measures of motor performance. A strong relationship was demonstrated. 


INTRODUCTION 


IN the past, children described as **'clumsy" received relatively little help from either 
the medical or educational professions unless the motor problem was perceived as 
being causally related to other problems which the child was experiencing — such as 
reading difficulty. More recently, there has been an upsurge of concern for children 
who find it difficult to acquire the motor skills required of them, regardless of 
whether these are accompanied by other difficulties. One of the reasons for this 
change in perspective can be discovered in the response of the physical education 
profession to the recommendations of the Warnock Report (1978) and its resulting 
legislation (Education Act, 1981). The task of planning a physical education 
curriculum which is suitable for both able-bodied and severely physically 
handicapped children has heightened awareness of the less well-co-ordinated 
child in general. As a result, there are now many more attempts to help all children 
with motor learning difficulties within the ordinary school system, as opposed to 
seeking help elsewhere. These changes have, of course, highlighted the dearth of well 
researched instruments which can be used to assess the children who have difficulties 
and to measure objectively the efficacy of programmes of intervention. The test 
described in this paper, the Test of Motor Impairment (Stott et al., 1984) was 
specifically designed to serve these purposes and is one of the few available 
instruments which was developed in Britain. 


The purpose of this paper is to describe one of the studies which influenced the 
recent revision of the Test of Motor Impairment (henceforth abbreviated to TOMI). 
First published in 1972, the test was designed to provide information on the 
performance of children who lack competence in the motor domain. The Henderson 
revision of the test was developed between 1977 and 1984 and was standardised on 
approximately 1,000 children between the ages of 5 and 12. Normative data on 
children in the UK and Canada is at present available and a standardisation is now in 
progress in the USA. As the rationale and form of the revision is described in the 
manual and several other publications (Henderson, 1984; Laszlo and Bairstow, 
1985; Stott et al., 1986) only a very brief overview will be given here. 


The revision includes changes to the original test in two major respects. First, 
the test has been quite radically restructured. Some items were deleted, new ones 
were designed and the whole reorganised into fewer levels each with a larger number 
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of items. Each level of the test (age band) now contains eight items, measuring 
performance along a continuum from gross to fine motor co-ordination. Among the 
various effects of restructuring, perhaps the one most worthy of note here is that the 
test is easier to administer yet yields more information than before. 


The second major change in the battery resulted from a desire to increase the 
usefulness of the test as a guide to planning remediation for the child with 
established motor difficulties. To achieve this, it seemed essential to produce an 
instrument which yielded more than just a composite score or a profile of 
performance as the original test had done. What was required was a means of 
describing the kinds of difficulties a child encounters when he cannot catch a ball, 
for example, or cyt with scissors. A series of checklists were therefore constructed 
which enable the tester to keep a systematic record of how the child performs the test 
tasks. The observations contained in the checklists can be loosely described under 
two headings: those which relate in specifically motor terms to the planning and 
execution of the actions (e.g., does not follow the ball with his eyes) and those which 
describe modes of addressing the tasks which contribute to poor performance (e.g., 
does not attend to the instructions). As these checklists are not used in this study, we 
shall not discuss them further. 


The purpose of the present study was twofold. Our first objective was simply to 
confirm that the revised form of the test was suitable for the same kinds of children 
as the original version. In this instance, our focus was on children who were 
intellectually retarded. Delayed motor development as a concomitant of intellectual 
retardation is now well documented (Rarick et al., 1970, 1976; Bruininks, 1974; 
Sugden and Gray, 1981; Wade et al., 1983) and it is accepted that the assessment of 
such children's motor performance should be considered an integral part of the 
evaluation of their overall level of functioning. Over the years, the original version 
of the TOMI has been used extensively with children whose intellectual handicap 
falls in the mild to moderate range and it is on this group we again focus. The 
children participating in the study all had IQs in the 50-70 range and had learning 
difficulties which were severe enough to warrant education in a special school. 


We began the investigation of whether the revised form of the test was suitable 
for use with retarded children by making some qualitative observations on how they 
responded to the tasks. It is well known that children who have experienced failure 
in school often do not respond well in formal testing settings. However, the TOMI 
has some particular characteristics which make it feasible for the tester to make the 
situation enjoyable for the child and to minimise any feeling of failure that he or she 
may experience. Our observations, therefore, focused on whether the children 
seemed willing to participate, whether they seemed to understand what was required 
of them and whether they showed any evidence of being bored or fatigued as the test 
progressed. 


In addition to the qualitative observations on the test, a more objective 
evaluation of its suitability was undertaken in the form of a detailed comparison of 
the scores of the retarded children and those of a group of normal children of the 
same sex and chronological age attending ordinary schools. For this purpose, each 
retarded child was individually matched with a child from the standardisation 
sample. This comparison allowed us to examine the extent to which the TOMI was 
capable of detecting motor problems in a population in which the incidence of such 
impairment is known to be high. It also permitted a detailed examination of any 
differences that might emerge between the groups in the pattern of performance 
exhibited on the test. Given some correlation between motor and intellectual 
performance it is likely that a group of moderately intellectually retarded children 
will exhibita group average motor score that is lower than for a group of normal 
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children. It is also likely that there will be more variability between the children in 
the retarded group. Often some overlap with the normal children is observed. This 
pattern of performance was obtained in most of the studies cited above and a similar 
pattern would be expected on a test like the TOMI. Failure to find either an 
increased incidence of problems in a group of retarded children or an increase in 
variability in comparison to normal children would tend to suggest that the test is 
not a valid instrument for use with such children. 


Our second, and more important objective, in this study was to add to the 
available data on the validity of the test (e.g. Stott ef al., 1975; Henderson and Hall, 
1982; Drillien and Drummond 1983; Umney, 1983). One of the questions we wished 
to pursue further was the extent to which the test results were in accord with the 
subjective judgments of teachers and other professionals working with children. So 
far, we know that children who were selected by their teachers as having poor motor 
co-ordination for their age performed significantly less well on the TOMI than other 
children in the same classroom (Henderson and Hall, 1982). Also, children referred 
to occupational therapists because they were clumsy did less well than a control 
group of the same age (Umney, 1983). However, these studies required the 
professional to make categorical judgments only. In the present investigation, 
teachers were required to make a much more detailed evaluation of their pupils' 
motor competence. For each child in the study, а 34-item checklist (Keogh et al., 
1979) was completed in which teachers rated their pupils’ performance on a variety 
of classroom and playground activities and gave their views on how the child 
responded to tasks requiring skilled movement. By examining the relationship 
between the children's scores on the TOMI and on the checklist we were able to 
explore more thoroughly the extent to which the inferences that are made about 
children on the basis of their test scores are similar to the judgments made by the 
child's own classroom teacher. 


Another aspect of validity which we wished to investigate further was the 
relationship between the test scores and other performance measures of motor 
behaviour. In 1982, Henderson and Hall reported the relationship between TOMI 
scores and performance on a battery of neuro-developmental tests, and recently, 
Sugden and Wann (1987) have reported the relationship between the Test of 
Kinaesthetic Sensitivity (Laszlo and Bairstow, 1985) and TOMI scores. In this study, 
we chose to examine performance on the class of motor skills often labelled self-help 
skills. Some of the most commonly reported problems experienced by children 
described as ‘‘clumsy’’ concerns activities like dressing, undressing, eating and 
organising their belongings. A strong association between TOMI scores and 
performance on self-help tasks would, therefore, provide supplementary evidence of 
the former's validity. As no well-developed schedules for such self-help tasks are 
eem in the literature, we devised our own. This focused exclusively on dressing 

ehaviour. 


The question of what kind of sample should be used to examine the aspects of 
test validity discussed above is not entirely straightforward. No attempt is made in 
the Test of Motor Impairment to capture the entire range of motor ability. Indeed, 
the form of scoring recommended does not permit one to differentiate between 
children who score above the fifteenth percentile point. This means, therefore, that 
one has to take cognisance of the problems associated with highly skewed 
distributions of scores. Put simply, to show that the 85 per cent of children who 
obtain an error free score on the test dress themselves competently would not be very 
informative. What is of more interest is whether impairment scores among children 
who have established difficulties relate to other measures of motor performance. 
Ideally, what is required is a group of children who have been classified quite 
independently as having problems in motor development. As this option was not 
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open to us, however, it seemed acceptable to use the intellectually retarded children 
for this purpose. Provided that no administrative problems with the test were 
encountered, the incidence of motor problems was likely to be higher in that group 
and a reasonable spread of scores should be obtainable. Consequently, the part of 
the study in which we examine the relationships between the various tests employs 
only the sample of retarded boys and not the control group. 


The study is presented in two parts. Part 1 deals with the comparison between 
the intellectually retarded children (henceforth labelled MLD) and their normal 
peers on the Test of Motor Impairment and Part 2 deals with the relationship 
between the TOMI, the teachers’ checklist and the dressing assessment. 


PART 1 


In this part of the paper we are concerned with the suitability of the TOMI for 
use with intellectually retarded children. We describe how such children responded 
to being tested and report the comparison between their scores and those of children 
of the same age drawn from ordinary schools. As one of the main features of the 
study was the analysis of the children's skill in dressing, this particular component 
determined our choice of subjects. In order to minimise the possible effects of 
different amounts of practice in dressing we decided to set a lower age limit of 
7 years. For a number of practical reasons relating to the type and size of clothing 
needed and the problem of scoring skills like buttoning, we also decided to select 
only boys as subjects and to set an upper age limit of 10 years. 


MLD group 

Twenty-two boys' attending a special school for children with learning 
difficulties served as subjects. Their chronological ages ranged from 84 to 131 
months, yielding a mean of 106:4 months. IQ scores derived from a number of 
different tests were made available by the school and these were then compared with 
scores on the Raven's Progressive Matrices administered to each child by the first 
author. All of the subjects fell within the appropriate IQ range, 50-70. Although 
many of the boys had difficulties of the kind encountered in such populations (e.g., 
language difficulties) none had a diagnosed specific physical or visual handicap that 
might have biased assessment of their motor performance. 


Control Group 

Each of the MLD boys was matched with a boy from the UK standardisation 
sample. All of the latter attended ordinary schools in a variety of rural and urban 
settings. As the sample was large, it was possible to obtain a very precise match for 
every retarded boy. Each pair of subjects was born in the same month of the same 
year and came from similar socio-economic backgrounds. No child from the 
standardisation sample was eliminated from the above selection on any grounds. 


Procedure 

The TOMI was administered to each child individually in his own school. Each 
boy performed the eight items appropriate for his age in a session lasting 
approximately 30 minutes.? The test is arranged in four age bands to be used with 
children aged 5 and 6, 7 and 8, 9 and 10, and 11 upwards. As the boys in our sample 
ranged in age from 7 to 10, this meant that the 7- and 8-year-olds were tested on one 
age band and the 9- and 10-year-olds another. The component items contained in 
these two age bands and the measures taken are summarised in Table 1. In order to 
maintain motivation and interest in the activities, sitting and standing items were 
alternated so that boredom or fatigue was avoided. Henderson and Schreiber (1982) 
have established that the order of presentation of the items is not important. 
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Scoring 

Like the original, the revised test is designed to measure motor impairment not 
motor ability in general terms. The score indicates the extent to which a child falls 
below the level of his age peers, and no attempt is made to provide a scale for 
performance that lies above the normal range. For each item the child receives a 
score of 0, 1 or 2: 0 denotes acceptable performance, 1 indicates that the 
performance fell in the lowest 15 per cent of the distribution and 2 indicates the 
bottom 5 per cent. Thus the higher the score, the less competent the child. A total 
Score for each level of the test is obtained by summing across items, yielding a 
possible maximum of 16 at any one age level. Using the UK standardisation sample 
as a referent, 5 per cent of children have been shown to obtain a total score of 6 or 
more at their own age level, and a further 10 per cent a score of more than 4. 


RESULTS AND DISCUSSION 


We will begin with the qualitative observations that were made on how the boys 
with learning difficulties responded to the Test of Motor Impairment. As evidence 
of reasonable motivation throughout the test, we can point to the fact that no child 
refused to begin the test or to attempt any of the items. Indeed, most children 
seemed to enjoy the experience. One reason for the children's enjoyment of the test 
is that positive reinforcement is available to them on most items. As they are not 
aware of the criterion for pass or fail, the degree of success they achieve can be 
genuinely praised and a second attempt usually leads to improved performance. On 
tasks such as ball catching where failure is impossible to conceal, the tester is advised 
to simplify the task in such a way that some success is achieved. Further evidence of 
reasonable motivation can be inferred from the fact that all of the children not only 
attempted but also completed all eight items in the test. 


An obvious problem that might occur in the assessment of children with 
intellectual retardation is that it might be unclear whether failure on the motor tasks 
was due to lack of motor competence per se or to lack of understanding of the test 
instructions. Insofar as we could judge, there were no instances of failure to 
understand the tasks. A feature of the test is that testers are encouraged to 
demonstrate and describe the items until they feel as confident as possible that the 
child has understood what is required. No specific verbal instructions for the tester 
are included in the manual. It is acknowledged that this may have both positive and 
negative consequences, in terms of the reliability and validity of the test, but on 
balance, it is felt that limiting the tester to one set of instructions is likely to result in 
more children failing for reasons other than lack of motor competence, than if the 
opportunity to be flexible is provided. In sum, using this approach we could find no 
evidence that led us to believe the test was unsuitable for administration to children 
with learning difficulties. Recently, Sugden and Wann (1987) have concurred with 
this viewpoint. 


Let us turn now to the quantitative data obtained on the retarded boys and their 
peers in ordinary schools. Table 1 presents the means, standard deviations and range 
of scores obtained by the two groups on the individual items of the test. Out of a 
total of 16 items there was only one on which the MLD boys had a better score than 
the normal boys. The 9- and 10-year-old MLD boys were slightly faster on a 
pegboard task but the difference between the groups was not statistically significant. 
On most items the differences between the mean scores were strikingly large as 
predicted. In addition, the group with learning difficulties was also more variable 
than the normal group. This can be seen in both larger standard deviations and 
wider ranges of scores. While there were few boys in the normal group who did 
badly on any item, there were always one or two from the MLD group who did well. 
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When the raw scores on the test were converted to scaled scores and combined 
to produce composite scores the results were similar to those on the individual items. 
As can be seen from Table 2, the MLD boys performed much less well than their 
peers. Whereas the mean total score for the normal boys was only 1-70, that of the 
retarded boys was 10-95, a difference which was statistically highly significant 
(= 8:36; P < 0-0001). As might be expected the pattern of total scores for the 
retarded boys mirrored that obtained on the individual items. A higher mean score 
was accompanied by much more variability in the MLD group. In the control group, 
selected from the standardisation sample, all of the boys scored less than 6 and 13 
scored less than 2. In contrast, the scores of the MLD group ranged from 2 to 16. 
Four of the boys scored less than 4 placing them within the top 85 per cent of the 
score distribution. These boys would not be considered to have serious motor 
difficulties. At the other end of the scale 18 boys scored more than 6, placing them 
below the fifth percentile. Five of these boys obtained a maximum of 16 points 


TABLE 1 
COMPARISON BETWEEN THE GROUPS ON INDIVIDUAL TEST ITEMS 


Age Band 2 (Age 7-8) years): Normal Boys 
Task Measure Recorded Mean SD 


Pegs in Board Time Taken 19-53 1-89 
22-38 2:22 


Lacing Board Time Taken 24:46 6:03 
Flower Trail 2 No. of Deviations 10:46 6-69 0-20 2:15 2-44 0-6 


Bounce and No. of Catches P 377 4:55 0-10 9-08 1-44 5-10 
Catch O 3:15 4:32 0-10 7:92 1-80 4-10 


Throw Bean Bag No. of Successes 2:23 2:62 0-7 8-0 1:87 4-10 


Stork Balance Time Balanced P 5-46 7-29 0-20 16:8 5:22 5-20 
О 3-69 6:49 0-20 17:2 5:92 2-20 


Jump їп Squares No. of Correct 2-46 0- 4:69 0:48 4-5 
Jumps 
3-69 0- 


1.66 5 
Heel Toe Walk No. of Correct Steps 4.68 15 13-6 3:33 4-15 
Age Band 3 (Age 9-10 years): MLD Boys Normal Boys 
as Measure Recorded Mean SD Range SD 


Shift Pegs by Time Taken 16-67 5:43 17-78 1-39 15-20 
Rows 16-78 4:63 18-67 2:00 16-22 


Thread Nuts on 24-88 7-15 12-32 
Bolts 


Flower Trail 3 No. of Deviations 


Two Hand No. of Catches 
Catch 


Throw Bean Bag : 1:05 
One Board Time Balanced * 17-56 3-97 

Balance Е 15:60 15-92 
Hop in Squares No. of Correct 
Hops 


Balance Ball No. of Drops 3-0 3-42 
P P = Preferred hand (or foot); O = Other hand (or foot) 
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TABLE 2 


TOMI TOTAL SCORES FOR THE Boys WITH LEARNING 
DIFFICULTIES AND THEIR MATCHED CONTROLS 


С.А. of Pair 
(Months) MLD Group Control Group 









PTPL PPPA 
ооооо о ото о о 





indicating that they did not reach the pass criterion оп а single item at their own age 
level.? 


In summary, this pattern of results is very similar to that obtained using other 
measures of motor performance (Bruininks, 1974; Rarick et al., 1976). Group mean 
scores on both the individual test items and on the composite scores were 
consistently higher (i.e., worse) for the children with learning difficulties than for 
the children in ordinary schools. In addition, the range of scores within the MLD 
group exceeded that of the normal children resulting in some overlap. These results 
tend to confirm the view that the revision of the Test of Motor Impairment created 
no unforeseen difficulties with respect to its suitability for the assessment of 
intellectually retarded children. 


PART 2 


In the second phase of the investigation, we examined the relationship between 
performance on the Test of Motor Impairment and that on two other measures of 
motor competence, obtained in ways different from that employed in the test: (1) a 
comprehensive teacher's checklist and (2) a specially designed assessment of the 
children's competence in dressing. If it could be shown that scores on both of these 
measures of motor performance bore a reasonable relationship to scores on the 
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TOMI then we would have strong grounds for concluding that the dimensions of 
motor behaviour measured by the test relate to the strengths and weaknesses of the 
child's competence in daily living. 


METHOD 
Subjects 
Only the boys with learning difficulties participated in this part of the study. 
Their characteristics were described in Part 1. 


Procedure 
The administration and scoring of the TOMI is described above. The other two 
tests are now described in more detail. 


The Teacher's Checklist 

In contrast to the many checklists available for the recording of social and 
emotional aspects of behaviour, there are very few indeed which deal with motor 
behaviour. We chose that of Sugden, Keogh and their colleagues (Sugden, 1972, 
1983; Keogh et al., 1979) because it is one of the most comprehensive and seemed to 
contain observations which classroom teachers would be able to make without 
difficulty. The checklist is divided into three sections of thirteen, nine and ten items 
respectively. The first two sections contain direct descriptions of motor behaviour 
which the teacher can observe in the classroom or in the playground (e.g., sits with 
good posture when working at table or desk). The third required the teacher to rate 
the child's behaviour in relation to motor skills (e.g., passive — requires encourage- 
ment to participate). 


Scoring 

Each item is rated on a four-point scale. For Sections I and II the rater first 
decides whether or not the child can do the task described, then chooses the degree 
of achievement. If the child can complete the task competently and efficiently he 
scores 1, if he can do it but not fluently he scores 2. If he is near to being able to do it 
he scores 3 and far from being able to accomplish it, 4. For Section III the rater 
indicates whether the behaviours are characteristic of the child. If the behaviour is 
not characteristic at all he scores 1, occasionally characteristic 2, sometimes 3 and 
often 4. A total score is obtained by summing the points for each of the 34 items. A 
high score is indicative of difficulty with movement skills.? 


(2) The Dressing Skills Assessment 

The Dressing Skills Assessment was devised by the first author. Four 
component skills were included: (1) putting on a shirt which had to be pulled over 
the head; (2) fastening three buttons at the neck of the shirt; (3) putting on long 
socks; (4) putting on shoes. To develop a measure of performance on the dressing 
tasks considerable pilot work was undertaken. Each task was first analysed into a 
series of component items which seemed to constitute all of the subskills necessary to 
complete the task. This preliminary analysis was partly intuitive and partly derived 
from existing literature. The preliminary analysis was then experimented with and 
adjusted where necessary. The resultant number of steps for putting on a shirt was 
10, buttoning 12, putting on socks 10 and shoes 8. Full details of the task analyses 
are available in Lam (1982). 


In order to be able to examine the performance of each child in detail and to 
establish the reliability of the scoring method used, a video tape was made of each 
subject attempting the tasks. The procedure and materials employed were the same 
for every child. Three sets of identical clothing were provided to suit boys of 
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different age and body size. Before beginning the layout of the clothes was indicated 
and the starting position for each task specified. The child was told to stand while 
putting on the shirt and sit down when putting on the shoes and socks. He was then 
asked to put on the clothes as quickly as possible. 


Scoring 

For each task two scores were derived, a time score and a performance score. 
The time score was a measure of the total time taken to complete each of the four 
tasks. A performance score was recorded for each step of the four tasks. Each item 
was scored on a three-point scale from 0 to 2: 2 denoting item completed 
independently; 1, item completed with some assistance, physical, gestural or verbal; 
0, item completed with maximum assistance or not completed. The total possible 
score for each task was therefore: Shirt 20, Buttons 24, Socks 20 and Shoes 16. The 
possible overall total was 80. 


As this test was designed specifically for this study, it was essential to establish 
the reliability of the scoring system used. In addition to the first author, an 
independent observer rated the performance of five boys randomly selected from the 
sample. Agreement between the two raters on the time scores was almost perfect. 
(т = 0-99). On the more qualitative observations agreement on the four tasks ranged 
from 83-75 per cent to 100 per cent with a mean of 93-08 per cent. 


RESULTS AND DISCUSSION 


Table 3 presents the matrix of correlations between the TOMI and the other 
measures of motor performance taken on each child: the teacher checklist score, the 
performance score on the dressing tasks and time score on the dressing tasks. 
Separate analyses were completed for the 7-8 year-olds and 9-10 year-olds, but as the 
matrices were very similar they have been collapsed into the table shown. Overall, 
the outcome is satisfactory in that the correlations between the TOMI and the other 
scores are highly statistically significant. 


Although the correlations appear to differ considerably, ranging from 0:58 
between the TOMI and the teacher checklist to 0:87 between dressing time and 
dressing competence, none of the differences between the correlations reached 
statistical significance (maximum CR = 1-68). 


In order to explore further the nature of the relationships between TOMI and 
the other assessments we also performed some additional analyses on specific 
elements of the various tests. For example, the TOMI contains some items in which 
speed is a critical factor and others in which it is not. One question that might be 
posed, therefore, is whether there is a particularly strong relationship between this 
subset of items and the time taken to complete the dressing tasks. This did seem to 
be the case. When only the items involving fast finger movements were considered in 


TABLE 3 
CORRELATIONS BETWEEN THE FOUR MEASURES TAKEN 


TOMI 0.72** 0-69** 0-58* 


Dressing Performance — 0-87** 0-65* 
Dressing Time — 0-57* 
Teacher Checklist m 


*P«0-05 "#Рр< 0:01 





398 Test of Motor Impairment 


relation to dressing times the minimum correlation obtained was 0:85. For example, 
the correlation between the time taken to thread a lace on to a board, a TOMI item, 
and buttoning time was 0:95 and between screwing nuts onto a bolt, another TOMI 
item, and buttoning time was 0:87, 


Another example of a more specific relationship between TOMI and the other 
measures was obtained when only the performance aspects of the teachers’ checklist 
was considered. Although the section dealing with the child's attitude and approach 
to motor tasks is very informative at a clinical level, it does require different sorts of 
observations from the teacher and in a number of ways stands apart from the other 
two components. When we examined the relationship between the TOMI scores and 
the performance sections of the teacher checklist the correlation was found to 
increase from 0-58 to 0-88. 


A different way of examining how close the relationship between the various 
measures of motor competence is to look at the ranks assigned to the children by 
each method. Using the TOMI as the reference measure, Table 4 shows the rank of 
the four best and four worst boys in the group on the other measures and the mean 
rank for each group. As far as the top four boys are concerned, the teachers' 
judgments seem to be slightly at odds with the other measures. However, this may be 
due to the fact that the teachers were more careful to differentiate between the less 
competent children than those who seemed capable. With respect to the four boys 
who had maximum scores on the TOMI, there was very close agreement between 
their test scores and their ranks on the other measures. 


TABLE 4 


RANKS OF THE MLD Boys GAINING Four LOWEST (1 E, BEST) SCORES AND Four HIGHEST (1 E, WORST) 
SCORES ON THE TOMI 


Mean Mean 
Four Lowest Rank Four Highest Rank 


TOMI 20— 20= 20= 20= 


Dressing Performance 22 20 18= 16 
Dressing Time . 21 19 18 
Teacher Checklist : 16 13 





The relationship between the ТОМІ and each of the other measures raises one 
or two questions of practical import. With respect to the TOMI and the dressing 
test, for example, there is the question of whether the relationship between the two 
tests stemmed from a common set of impaired processes or whether all that can be 
concluded is that children who are described as ‘‘clumsy’’ are poor at a variety of 
motor skills and, therefore, fail on both measures. Of course, this is but the 
generality vs. specificity issue in another guise (see Schmidt, 1982, for a discussion). 
In the literature on normal motor skills development, evidence of the specificity of 
achievement on particular skills is nearly always emphasised over evidence of a 
general factor underlying motor ability. Teachers and therapists, of course, find this 
particularly depressing as the prospect of teaching such children all the necessary 
skills as separate tasks is daunting to say the least. Clearly, there is no way of using 
this kind of study to answer the question but it raises the possibility that the study of 
motor impaired children might yield rather different results from studies of normal 
children and that such results might shed some light on the issue. 


As far as the ratings made by the teachers in this study were concerned the 
relationship with the TOMI was rather better than that between teacher ratings and 


YIN YUK (J.) LAM AND SHEILA E. HENDERSON 399 


standardised test scores, reported in other studies (e.g., Keogh et al., 1979). In our 
view, the crucial factor is the kind of teachers involved. Generally, teachers are not 
well trained in issues of motor development and consequently they find it difficult to 
make detailed judgments of their pupils! performance. However, in a previous study 
using the TOMI, the teachers taking part had spent several weeks in the presence of 
the testers engaged in the standardisation and had participated in many discussions 
of what ''clumsiness" meant. Their judgments proved to be very accurate 
(Henderson and Hall, 1982). In the present study, the teachers had a special interest 
in children with difficulties and, moreover, were used to observing their pupils' 
behaviour systematically. This probably made it easier for them to complete the 
questionnaire than it otherwise might have been. Obviously, there is a need for 
systematic study of the judgments of different kinds of teachers with different 
amounts of training and in different settings. However, we have shown here that 
teachers can progress beyond global judgments of their pupils’ motor competence 
and that their judgments accord with those derived from a normative test. 


CONCLUSION 


In the introduction to this paper, the need for a well researched test for the 
assessment of motor difficulties in children was noted. We suggest that the Test of 
Motor Impairment satisfies this requirement. 


The test was devised to identify and describe motor problems that exist in 
children who do not suffer from any clear physical or neurological disorder. It is not 
suitable for use with children so severely physically handicapped that they cannot 
perform basic skills such as sitting, standing or grasping. Unfortunately, no one has 
yet solved the problems of producing a test that is sensitive to the complete range of 
motor ability (see Henderson, 1987, for a review of the issues). Among the 
heterogeneous group of children who could not conceivably be formally classified as 
physically handicapped but who are, nevertheless, often described as having 
considerable motor difficulties, one of the largest sub-groups consists of children 
who have mild to moderate intellectual difficulties. It is essential, therefore, that any 
instrument which purports to be suitable for children with moderate motor 
difficulties be suitable for use with mentally retarded children too. In this study we 
present evidence which suggests that the test does serve this function satisfactorily. 


FOOTNOTES 


1. In the original study there were 26 boys in the sample, four over the age of 10. As the final version of 
the 11+ level of the TOMI was not then available they could not be tested on items appropriate for 
their age. We have therefore eliminated their data from this report. 


2. In this study each child completed only one level of the TOMI that was appropriate for his age. For 
clinical purposes it is recommended that a child who scores more than 6 on any one level be tested at 
lower levels to establish a baseline and/or profile of performance. Such information is crucial 1n 
planning an intervention programme which 1s appropriate to the child's age and capabilities. 


3. Some of the results presented in this paper differ slightly from those of Lam (1982). As the criteria for 
scoring the test items had not been finally established when the dissertation was written some of the 
raw data was rescored. Also, after 1982 Sugden produced a more consistent method of scoring the 
teacher checklist which we have adopted here. 
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SUMMARY. This paper reports on the application of the Goodenough-Harris DAM test to a 
representative sample of 1,195 Iranian children (548 boys and 647 girls) aged between 6 years and 13 
years and enrolled in grades 1 to 5 of eight public primary schools in the city of Shiraz, southern 
Iran. As expected, the mean DAM scores of these children increased substantially from grade 1 to 
grade 5 and from age six years to age 11 years. There were, however, significant sex differences, in 
favour of girls, across all age and grade levels. Children from high SES schools also scored 
significantly higher than those attending low SES schools. DAM scores correlated from 0-30 to 0:45 
(P < 0-01) with such criteria of academic achievement as teachers’ ratings and marks obtained in 
mid-term examinations. The split-half reliability of the test was found to be around 0-80 while its 
test-retest reliability over a 12-week interval was 0-75. The mean score of Iranian children, however, 
fell substantially below the American norms reported by Harris. 


INTRODUCTION 


The use of drawings of human figures as a measure of the intellectual ability of children 
goes back to the early days of the mental testing movement (Burt, 1921). The standardisation 
of a scoring system for such drawings by Goodenough (1926) and its later extension and 
refinement by Harris (1963) has resulted in the emergence of the draw-a-man test (DAM) as a 
common and accepted measuring instrument for the quick assessment of children's overall 
intelligence (Sattler, 1982). Its use as the main measure of intellectual performance in several 
large-scale epidemiological surveys (e.g., Harris and Pinder, 1974, 1977; Abraham ef al., 
1979) bears testimony to the continuing popularity of this test. 


A recent review of the pertinent literature by Scott (1981), while confirming the value of 
DAM as a highly reliable measure of certain perceptual-cognitive behaviours, has raised 
serious doubts regarding its predictive and concurrent validity in terms of correlations with 
academic performance and with such other measures of intelligence as the Stanford-Binet and 
the Wechsler Intelligence Scale for Children. 


From a cross-cultural point-of-view, while Dennis (1966) has shown a strong relationship 
between the DAM scores and such cultural experiences as literacy and visual arts, a large 
number of studies in quite different cultural contexts (Phatak, 1959, 1961; Bakare, 1977; 
Georgas and Papadopoulou, 1968; Laosa et al., 1974; Kagitcibasi, 1979) have demonstrated 
the value of this test as a sumple and easily applicable measure of intellectual development. The 
present paper reports some normative data on a group of Iranian children and certain 
preliminary evidence regarding the reliability and validity of the test in the Iranian cultural 
context. 


METHOD 

Subjects 

Subjects were 1,195 children (548 boys and 647 girls) attending grades 1 to 5 of eight 
public primary schools within the city of Shiraz. Shiraz is one of the five major urban centres 
of Iran. The children were selected in such a way as to provide a representative cross-section of 
the children attending normal primary schools in the city of Shiraz. The selection was in terms 
of schools with known catchment areas. Previous research by the present authors (Mehryar 
and Tashakkori, 1984) and the Population Center of Shiraz University (Gharakhani and 
Mehryar, 1978) indicated the practicality and validity of such a sampling strategy. Of the eight 
schools covered, four are known to cater for children from predominantly lower working class 
segments of the city. The remaining four schools are situated in upper-middle*class areas of 
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the city. Like the rest of the Iranian educational system, these are all state owned single-sex 
schools predominantly staffed by female teachers. 


In each school one class was randomly selected at each grade (1 to 5), and all students in 
those classes were presented with the DAM as described below. In the second phase of the 
study, conducted 12 weeks later, two schools (one boys' and one girls") were selected from 
each set of four schools representing the lower and middle-upper SES categories. Subjects 
(569) in these schools were retested with DAM. Owing to the fact that 17 subjects were absent 
at the time of retesting, there were only 552 retests. Teachers responsible for each class were 
asked to rate each child on three 5-point rating scales evaluating intelligence, overall scholastic 
performance, and interest in school. In two classes the teachers were new, and for that reason 
they were not asked to evaluate the subjects in their classes. Hence, a total of 510 evaluations 
were obtained. All subjects’ (N = 569) scores on the semi-formal mid-term examinations were 
available at the time of the retest and were obtained from the school files. 


Instrument and procedures of testing 
Children were tested in groups of 30 to 40, depending on the size of their class. There was 
no time limit but the majority were able to finish their drawings within 20 minutes. 


The drawings were scored according to the criteria developed by Harris (1963). In order 
to make comparisons across different age and sex groups, the raw score obtained by each 
individual was transformed into a standard score with a mean of 100 and a standard deviation 
of 15. For this purpose, norms summarised in Harris’s (1963) Tables 32 to 35 were used as 
fixed criterion groups. 


RESULTS AND DISCUSSION 


Table 1 presents the means and standard deviations of the raw DAM scores and their 
standard score equivalents for children aged 6 through 11 years, separately for boys and girls. 
A perusal of this table indicates that, as expected, the raw DAM scores have increased 
substantially from age 6 (grade 1) to age 10 (grades 4-5). The 11 years age groups, however, 
show some decline in mean DAM score as compared with the immediately preceding age 
groups. This is apparently due to the presence of an unduly large proportion of older, and 
presumably duller, children in this age and grade level whose academic failure has necessitated 
a repetition of the more formally evaluated 5th grade, the final year of primary education. 
Nevertheless, age and raw DAM scores are found to correlate 0:43 (P « 0-01) and 0-36 
(P « 0-01) for girls and boys, respectively. Moreover, at all age levels, girls as as group 
scored higher than boys. A two-way analysis of variance showed that the overall age and sex 
differences summarised in Table 1 were highly significant (Main Effect, F = 57:02, df = 6, 


TABLE 1 
MEANS AND STANDARD DEVIATIONS OF IRANIAN CHILDREN ON DRAW-A-MAN TEST BY AGE AND SEX 











Girls Boys Total 

Raw Score Scaled Score Raw Score Scaled Score Raw Score Scaled Score 
Age Groups N Mean SD Mean SD N Mean SD Mean SD N Mean SD Mean SD 
6 years 88 18-87 7-23 93:16 16-22 113 15:39 5:77 90:91 14:62 201 16:91 6:66 91:90 15:34 
7 years 74 20-46 5:59 89-39 12-50 86 19-42 8-11 90-99 15:12 160 19:90 7:05 90-25 13:95 
8 years 142 24-50 6:78 89-40 11-58 83 21-29 6-71 88-36 11:00 225 23-32 6-91 89:01 11:34 
9 years 133 25-08 7:16 84:41 12:93 81 23-09 6:94 85:43 11:24 214 24-33 7-13 84:79 12:30 
0 years 122 28:74 8:40 83:12 12:72 106 25-99 7:16 85-88 11:54 228 27-46 7-95 84-40 12:24 
1 уеагѕ 88 26-58 6:84 73:52 10-83 79 25-25 7-10 79-59 11-89 167 25:95 6:95 76:39 11:71 
“otal 647 24-48 7-80 85:54 14-01 548 21:52 7:91 87:12 13-27 1195 23-12 7-85 — — 
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P «0-001; Sex, Е = 33:3, df = 1, P < 0-001; Age, Е = 58-0, df = 5, P < 0-001), the 
interaction between age and sex being non-significant (F = 0-92, df = 5, P « 0:46). 
Comparing boys and girls of different ages, the observed differences in favour of girls were 
highly significant at all but two age levels, i.e., age 7 and 11 years. 


The standardised score equivalents of the raw scores, also summarised in Table 1, show a 
definite decline from the oldest to the youngest age group among boys and girls. This negative 
trend is represented by correlation coefficients of — 0:29 (P < 0-01) апа — 0:41 (P < 0-01) 
between age and standardised DAM scores of boys and girls, respectively. Moreover, the 
mean standard scores of Iranian children would appear to be substantially lower than Harris's 
(1963) normative data used as criterion groups. The differences range from less than half a 
standard deviation in the case of the youngest age groups to over one and a half standard 
deviations in the case of the oldest age groups. 


The overall inferior performance of Iranian children vis-a-vis their American age-mates 
may be partially due to their relative disadvantage with regard to prior experiences with play 
involving paper and pencil and a general lack of emphasis on artistic expression in Iranian 
culture and education (Dennis, 1966). (As an indication of the current status of arts in Iranian 
schools, it may be of interest to note that a recent study conducted at several primary schools 
in Shiraz city revealed that not a single child of several hundred pupils had mentioned painting 
and other artistic subjects either as one of their favourite lessons or as their aspired future 
occupation.) Similarly, lower performances have been noted for other Iranian samples on a 
variety of intelligence tests adopted from the West (Mehryar and Shapurian, 1970, 1972). 
Another explanation is that, as Scott (1981) has noted, Harris's (1963) standardisation groups 
were probably biased towards the upper end of the distribution of the IQ. Indeed, several 
large-scale studies recently carried out in the USA have obtained mean standard scores (using 
Harris's 1963 norms as criterion reference groups) of almost 10 IQ points below the norms 
reported by Harris. This upward bias would, moreover, seem to increase substantially from 
age 12-11 to age 6-5. 


Reliability 

Two different methods of estimating reliability were employed. Comparing the sum of 
scores obained on the odd and even items (or scoring criteria) a split-half measure of reliability 
was calculated. Repeating the test after an interval of 12 weeks on a subsample of 552 children 
provided a measure of test-retest reliability. The test-retest reliability coefficient was 0-75 
(P « 0-001). The split-half reliability estimates were calculated both for the first administra- 
tion and its repetition after 12 weeks. The correlation coefficients obtained were 0:67 
(N = 1195) and 0-69 (N = 552) for the first and the second administrations respectively. 
Corrected by the Spearman-Brown formula, these two correlations yield split-half reliability 
estimates of 0-80 and 0-82 for the first and the second administrations of the test, 
respectively. Although these figures are not as high as one would have liked, they compare 
well with the range of test-retest reliability coefficients reported by Harris (0-68 to 0-91) and 
by later studies reviewed by Scott (0-53 to 0:87). 


Validity 

The predictive validity of the DAM test in its new context was investigated in two 
somewhat different ways. In the first place, teachers responsible for each class were invited to 
rate each child in terms of overall intelligence, scholastic promise and general interest in school 
work. Simple, five-point rating scales were used for this purpose. In the second place, marks 
obtained by each child in the semi-formal mid-term examination held by the school were 
employed as criteria of academic performance. There were five different marks representing 
each child's performance in the five major subjects which constitute the common and 
centrally determined curriculum of Iranian primary schools. The five subjects are Reading, 
Science, Dictation, Arithmetic and Religious Education. Table 2 gives the correlation 
coefficients obtained between these eight criterion variables and the standardised score 
equivalents of the DAM scores for the sample as a whole. To obtain further confirmation of 
the predictive validity of the test, the validity coefficients presented in Table 2 have been 
calculated separately for the first (N = 510) and the second administrations (N = 494) of the 
test. The resulting coefficients of correlation, as shown in Table 2, although not very high, are 
highly significant. They also compare quite well with correlations obtained between the DAM 
TUM zs h variety of measures of academic performance reported by western researchers 

cott, , 
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TABLE 2 


CORRELATIONS BETWEEN THE STANDARDISED DAM SCORES AND MEASURES OF ACADEMIC 
PERFORMANCE FOR A SUBSAMPLE OF IRANIAN CHILDREN TESTED TWICE AT 12-WEEK 





INTERVAL 

Criteria of academic performance: Test DAM Score Retest DAM Score 

Teachers' Ratings on: (N = 510) (N = 194) 
Overall achievement 0-37 0:40 
General intelligence 0-38 0:37 
Interest in school work 0-24 0-24 

Mid-term Examination Marks in: (N = 569) (N = 547) 
Arithmetic 0-38 0-40 
Dictation (Spelling) 0:30 0-32 
Science 0-45 0:45 
Reading 0-43 0-44 
Religious Education 0-30 0-34 


Note АП correlations are significant at P < 0:001. Religious education 1s taught only in 
grades 3 to 5, their N being 340 (at Test) and 332 (at Retest). 


In addition to the evidence of predictive validity presented above, there are two main 
sources of evidence regarding the construct validity of the DAM test in the present study. One 
of these, an increase in DAM score with age and grade level, has already been discussed. 
Another source of information lies in the ability of the DAM scores to differentiate between 
children from low and high socio-economic backgrounds. Such social class differences in 
measured intelligence in Iran have previously been reported by Mehryar (1972) for a relatively 
large sample of Iranian adolescents. In the present study, children were divided into two main 
groups in terms of the location of their schools. One group, called high SES (N = 530), is 
enrolled in four schools situated in a modern, middle class residential area of the city. The 
second group (N = 665), called low SES group, belongs to the four schools located in the 
predominantly lower class business and residential areas of the city. The mean DAM scores of 
these two groups are 25:42 and 21:29, respectively. The scaled score equivalents of these 
means — using Harris's norms — are 91-42 and 82:16. Both differences are highly significant 
and attest to the construct validity of the DAM test in differentiating children from different 
socio-economic and family backgrounds. 


As a further mark of the differential validity of the DAM in the Iranian setting, it may be 
noted that our attempts to use this test with retarded children and adolescents attending 
special institutions in Shiraz city have mostly resulted in almost totally unscorable drawings. 


CONCLUSIONS 


From the data presented above, the Draw-A-Man test would appear to be a simple, 
relatively consistent, and discriminating measure of cognitive efficacy or intelligence in the 
Iranian context. Its positive correlation with age is indicative of the fact that the ability or 
collection of abilities that may underly performance on this test is subject to maturational 
changes. The differences between children from contrasting SES backgrounds, however, 
bears witness to the importance of social-environmental conditions for the proper 
materialisation of this maturational potential. Regardless of their basic nature and 
components, the kind of responses sampled by the DAM test have some significant, albeit 
modest, bearing on the academic performance of children. 


All the psychometric weaknesses noted for the DAM test are also shared by other western 
tests of intelligence so far tried on Iranian subjects (Valentine, 1959; Mehryar and Shapurian, 
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1970, 1972; Hosseini and Razavieh, 1972; Bash and Bash-Liecht, 1984). The only thing 
that would seem to differentiate the DAM from other measures of intelligence so far used in 
this country is its more favourable results with girls. Indeed, this is the first time that in a 
relatively large and unselected sample Iranian girls have obtained significantly higher scores 
than Iranian boys. This finding is, of course, in line with those of several studies reported 
from different western countries (Harris, 1963; Scott, 1981). Thus, investigators interested in 
controlling test bias against girls in this part of the world may find it useful to include DAM 
test in their test battery. 


With regard to the substantially lower mean scores of Iranian children in comparison with 
Harris’s American norms, two points would seem to be worth noting. First, as pointed out 
above, similar differences have been observed with other western IQ tests tried on Iranian 
samples. The differences have persisted regardless of the type of the scale (i.e., verbal vs non- 
verbal), and the educational background of the subjects, from illiterate peasants and workers 
to medical students. In the case of DAM, it is at least reassuring to know that part of the 
observed difference may be due to the biased composition of Harris's normative groups. 
Nevertheless, in view of the fact that the non-verbal nature of the DAM test may encourage its 
use and interpretation as a culture free instrument of measurements in non-western developing 
countries like Iran, the observed disadvantage of Iranian children in this test should be taken 
into consideration by people who may wish to apply American norms on the DAM in 
developing countries. In this respect, all users of the DAM may benefit from combining 
Harris's (1963) norms with later normative and psychometric evidence culled by Scott (1981). 
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THE EFFECTS OF TEST MATERIALS AND THE ORDER OF 
PRESENTATION OF THE MATERIALS ON YOUNG CHILDREN'S 
UNDERSTANDING OF CONSERVATION OF NUMBER 


By JAMES HANRAHAN, MARSHA YELIN ANp SOCRATES RAPAGNA 
(McGill University, Montreal, Canada) 


SUMMARY. A sample of 60 kindergarten and grade 1 children was administered conservation of 
number tasks. The effects of three variables on conservation of number were measured (familiarity 
— the degree of familiarity of the materials; order — the order of presentation of the materials; 
attachment — the nature of the relationship between the two rows of materials used in conservation 
of number tasks). Of these three variables, only familiarity was significant. Children appear to find 
conservation of number easier when tested with materials familiar to them as compared to 
unfamiliar materials. 


INTRODUCTION 


Many researchers have attempted to construct a developmental scale of intelligence based 
on Piagetian theory (Uzgiris and Hunt, 1975; Kingma and Koops, 1983; Carroll et al., 1984). 
One criterion for such a scale is that children consistently pass items up to a given intellectual 
level and then consistently fail items beyond that level. Researchers have had great difficulty 
constructing tests that satisfy this criterion (Tuddenham, 1970; Silverstein et a/., 1975). When 
items are chosen that should be passed by children at a given intellectual level, children 
typically solve some items correctly but fail to solve others. 


Such inconsistencies have been described by Piaget (1971) as time lags. An example of a 
time lag (Piaget, 1952) is that 7-year-old children do not understand that there are more beads 
than brown beads when given a collection of brown and white beads. They do understand, 
however, that there are more children than girls in their class of boys and girls. The 
classification problem is the same in each situation but the material to be classified differs. 


Researchers have concentrated on conservation tasks, particularly conservation of 
number, when investigating the time lag phenomenon (Lloyd, 1971; Brown, 1973; Yelin, 
1979). The usual testing procedure is to have the experimenter set out 10 wooden blocks in a 
row and have the child match the experimenter's row by placing 10 new blocks in a one to one 
correspondence with those of the experimenter. The experimenter then gathers his row of 
blocks into a small pile and asks the child **who has more blocks now . . . you or me?’’. If the 
child, despite the perceptual transformation of the experimenter's blocks, answers that both 
rows still have the same number of blocks, then the child is said to have conserved number. 


Typically there are two rows of identical objects in a conservation of number experiment. 
It is possible, however, to use different objects in each row. For example, the experimenter's 
row may contain baseball bats while the child's row may contain baseball players. When there 
is a close relationship between both rows of objects, the objects are said to be in provoked one 
to one correspondence. When, instead, the same objects such as wooden blocks are used in 
both rows, there is little attraction between the rows of objects and they are described as being 
in spontaneous one to one correspondence. According to Piaget (1952), children find 
conservation of number easier with provoked materials than with spontaneous materials. 


There is also some evidence that familiarity with the test materials influences success on 
conservation tasks — familiar objects being easier to conserve than unfamiliar ones (Kahn and 
Reid, 1975). As well, the order of presentation of familiar and unfamiliar materials may affect 
responses to conservation problems. According to Kahn and Garrison (1973), a conservation 
task using unfamiliar material is easier to solve when preceded by a conservation task using 
familiar materials. 


The present study was undertaken in an effort to identify factors that influence the time 
lag phenomenon. It was decided to focus on conservation of number because this 
conservation task permitted considerable flexibility when choosing materials. TRree variables 
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were chosen as having a possible influence on conservation of number: (1) famiharity — the 
degree of familiarity of the materials, (2) order — the order of presentation of the materials, 
(3) attachment — the nature of the relationship between the two rows of materials (this 
relationship could be either provoked or spontaneous). It was hypothesised that each of these 
three variables would cause time lags when solving conservation of number problems. 


METHOD 
Sample 
A sample of 60 kindergarten and grade 1 children was drawn from a public elementary 
school in the Montreal area. All children in the sample were judged to be of at least average 
intelligence by their classroom teachers. The mean age of the sample was 6 years 8 months. 
The sample included 36 boys and 24 girls. 


Test materials 

The materials for the experiment had to represent a provoked relationship or a 
spontaneous relationship between the two rows of objects. As well, one set of provoked 
objects had to be familiar to the child, the other set unfamiliar. Similarly, one set of 
spontaneous objects was required to be familiar, a second set unfamiliar. Small dolls with 
matching beds were chosen to satisfy the familiar-provoked condition while nuts and 
matching bolts were chosen for the unfamiliar-provoked condition. Chiclets brand chewing 
gum was chosen to meet the familiar-spontaneous condition while common faucet washers 
met the unfamiliar-spontaneous condition. Before participating in the experiment, each child 
was shown the test materials and asked to explain their possible uses. All children in the 
sample recognised the familiar materials; only three children recognised any unfamiliar 
materials. 


Design 

The 60 children were randomly assigned to one of four groups. The groups attempted 
conservation of number using the following materials: group 1 (dolls-bed, nuts-bolts); group 2 
(nuts-bolts, dolls-beds); group 3 (chiclets, washers); and group 4 (washers, chiclets). As a 
result of this procedure, the experiment was balanced for each of the three variables — 
familiarity, order, and attachment. 


The data were analysed as a 2 X 2 X 2 design with repeated measures on the last factor. 
The between-groups factors are attachment and order. The within-subjects factor is 
familiarity. The MANOVA subprogram of SPSSX was used to carry out the analysis. 


Scoring 

A scoring system similar to that employed by Piaget (1952) was used in the present study. 
Children who were unable to place objects in one to one correspondence were given a score of 
one. Establishing one to one correspondence resulted in a score of two while conserving 
number was scored as three. 


RESULTS 
The results of the analysis of variance are presented in Table 1. The cell means are 
presented in Table 2. Only the main effect dealing with familiar versus unfamiliar materials 
was significant (P < 0-001). It would appear that children were able to conserve number more 
easily when the materials with which they were working were familiar to them. 


In order to study the relationship in more detail, a cross-tabulation was made for the 
familiar and unfamiliar scores for all subjects (SPSS Crosstabs). The results of this cross- 
tabulation are found in Table 3. Interestingly, 39 children are able to conserve number 
perfectly when using familiar materials. Of these 39, only 22 can conserve with unfamiliar 
materials. The remaining 17 have either moderate or extreme difficulty with the conservation 
task when working with unfamiliar materials. 
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TABLE 1 
ANALYSIS OF VARIANCE 





Source df ms f P 





Between Subjects 


Attachment 1 2-700 3-069 0-085 
Order 1 0-300 0-341 0-562 
Attachment — Order 1 0-033 0-038 | 0-842 
Error 56 0-880 


Within Subjects 


3-333 12:444 0-001 
0-000 0-000 1-000 
0:533 1:991 0:164 
0:133 0-498 0-483 


Familiar 

Attachment — Familiar 

Order — Familiar 

Attachment — Order — Familiar 


— m = 


Error 56 0:268 
TABLE 2 
CELL MEANS 
Familiar Unfamiliar Combined 
Attachment Order Order Order 


provoked 


spontaneous 2:333 | 2-400 | 2-367 
















1 2 
2:667 | 2:667 | 2:667 














2-500 | 2:533 | 2:517 2-300 | 2-067 | 2:183 


2:400 | 2-300 | 2:350 
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TABLE 3 
CROSS-TABULATION OF THE FREQUENCIES OF FAMILIAR BY UNFAMILIAR SCORES 





Unfamiliar Material 














Row 
Scores I 2 3 Total 
T 1 0 8 
1 87.5" 12:5 0-0 
ч 53-8 4-5 0-0 
E 
os 
5 1 9 3 13 
E: 2 7-7 69-2 23-1 
Е 7-7 40:9 12-0 
Ti 
5 12 22 39 
3 12-8 30-8 56-4 
38:5 54-5 88-0 
Column 
Total 13 22 25 60 


* number of subjects 
PRow per cent 
*Column per cent 


DISCUSSION 


The results of the present study strongly suggest that children find conservation of 
number easier with familiar materials. They appear to have learned to conserve in specific 
instances where familiar materials are used but are unable to generalise to a rule to cover all 
instances. 


The current findings suggest that, in the development of a Piagetian-type scale of 
intelligence, all items should be presented using familiar materials. Such a requirement would 
demand a considerable amount of flexibility on the part of the test. It is unlikely that all 
populations with whom such a test would be useful would find a predetermined set of 
materials familiar. The test then should permit the inclusion or exclusion of certain materials 
depending on the population being tested. Whether such an instrument can have this degree of 
flexibility and still provide meaningful comparisons is a question that demands further study. 


Correspondence and requests for reprints should be addressed to Dr. J. P. Hanrahan, Faculty of 
Education, Department of Education Psychology and Counselling, McGill University, 3700 McTavish 
Street, Montreal, Canada H3A 1Y2. 
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DELAYED RETENTION OF ORALLY PRESENTED TEXT NEUES 
PICTORIAL SUPPORT 


By J. PEECK anp M. W. JANS 
(University of Utrecht, Holland) 


SUMMARY. University students were presented with either a tape-slides presentation on an African 
tribe, the tape only, or the slides only, and took a retention test two days later. In comparison with 
the slides condition, the tape-slides condition retained significantly more information presented by 
text and pictures but significantly less information presented by pictures-only. There were no 
differences between the tape-shdes condition and the tape condition in retention of unillustrated text 
content, irrespective of its relationship to what was shown in the picture, but illustrated text content 
was better retained in the tape-slides condition. 


INTRODUCTION 


Since the early 1970s an increasing number of studies have been concerned with learning 
and retention of illustrated text in comparison with text without illustrations. One way to 
describe the outcome of this research is to analyse the results in terms of three categories of 
information (cf. Peeck, 1974; Levie and Lentz, 1982): information provided by text and 
illustrations, information provided by text alone, and information presented by only the 
pictures. 


As for the first of these categories, the general finding has been that the presence of 
illustrations facilitates text information that is actually depicted in the illustrations. In their 
review, Levie and Lentz (1982) list 46 comparisons of learning illustrated text information 
with and without pictures, and conclude that ‘ће results . . . reveal an overwhelming 
advantage for the inclusion of pictures" (p. 203). 


A good deal less certain is the fate of the second category. Their survey of the existing 
evidence led Levie and Lentz to the conclusion that illustrations have no effect on learning text 
information that 1s not illustrated, i.e., they neither help nor hinder. It is, however, not very 
clear how this failure to obtain effects should be interpreted. It could mean that, as Levie and 
Lentz suggest, the presence of illustrations simply does not affect retention of non-illustrated 
text content, but various alternative interpretations could be given. It could, for instance, 
mean (Peeck, 1974) that in the pertinent studies positive and negative influences were 
simultaneously effective (on different text elements) but cancelled each other out. It could also 
mean that other characteristics of the experimental design in the studies under review 
prevented effects from occurring, and that with, for instance, different learning materials or 
different dependent measures effects might well be found (Peeck, 1987). Several recent 
studies have shown that this may be so. In one recent experiment, for instance, Van Dam et al. 
(1986) obtained inhibition of retention of non-illustrated text content when university students 
read and gave oral recall of a 560-word text on a historical subject that consisted of 20 
paragraphs (called ‘‘scenes’’) dealing with subtopics. Illustration of half the subtopics led to a 
significant drop in gist recall of the unembellished scenes, in comparison with recall of the 
same scenes when no illustrations accompanied the text. Similar results were reported by 
Gunter (1980), who in an experiment on recall of television news items, concluded that 
“picture (i.e., illustrated) items may inhibit learning of non-pictorial items” (p. 127). 


Somewhat in contrast, Rusted (Rusted, 1984; Rusted and Hodgson, 1985) has argued, 
with experimental support, that, though positive effects for unillustrated text content may be 
absent in narrative text, facilitation does occur when pictures accompany expository learning 
material for children. In an expository text pictures, according to Rusted, provide ‘‘a 
conceptual framework for comprehension of the passage, promoting distinctiveness in the 
encoding of that text, and hence more efficient subsequent recall" (Rusted, 1984). In contrast, 
children's encoding and recall of information from a story text are primarily dictated by the 
use of the story schema (cf. Mandler and DeForest, 1979); since this schema ''provides an 
overlearned amd highly effective strategy for organisation of the text at encoding, the 
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alternative framework provided by the picture is overlooked” (Rusted and Hodgson, 1985, p. 
289). 


It should be noted that Rusted's suggestions and findings are not necessarily 
contradictory to what was reported by Van Dam et al. and by Gunter. Their research dealt 
with retention of more or less separate topics — be it of an expository kind — whereas the 
texts considered by Rusted generally concern a well-integrated treatise of one particular topic. 


As for the retention of the third category of information, provided by pictures only, 
unfortunately only few experimental results are available. The outcome of studies assessing 
retention of pictures presented without text (e.g., Nickerson, 1968; Standing et al., 1970) 
suggests that considerable memory for illustrations in illustrated text could well be expected, 
though the actual performance will probably depend on such factors as type of information 
tested (Mandler and Johnson, 1976) and the assessment mode (Spoehr and Lehmkuhle, 1982). 
The evidence that has been obtained for this category of information is somewhat 
inconclusive. Substantial memory for picture-only information was, for instance, reported by 
Peeck (1974; see also Peeck, 1985), but no retention of this kind was found by Jahoda and his 
associates (Jahoda et al., 1976). 


In the present study, data regarding all three categories of information were collected in a 
design that included three groups of subjects who were presented with a text plus illustrations, 
text only, or illustrations only. The latter condition — usually absent in illustrated text 
research — was included to enable examination of how retention of pictorial information is 
affected by text, and to provide a baseline for the retention of text-only information. Apart 
from questions dealing with illustrated text and picture-only information, the retention test 
(multiple choice) contained two types of unillustrated text information: unillustrated elements 
from the text that bore some relationship to illustrated text content, and text elements that did 
not. These two types are included in order to see whether facilitation for the former could be 
achieved even when retention of the latter type was not affected. Such facilitation could occur 
when the enhancement of memory for illustrated text elements would spread to other text 
elements closely associated with the benefiting contents. Though in an early test of this 
speculation with fifth graders (Peeck, 1972; cf. Peeck, 1987) no differential retention 
effects were found, it was decided the possibility deserved some further experimentation. 


METHOD 
Materials 
The learning material was a 1650-word passage providing factual information about an 
African tribe, the Dogon. The text described topics such as way of life, means of existence, 
and customs of the tribe, and was illustrated by five colour photographs, depicting some of 
the information on the topics treated in the passage. Of this text a tape-recorded version with a 
playing time of about 11 minutes was made. The pictures were made into colour slides. 


The post-test contained 60 multiple choice questions, each consisting of a correct 
response and three distractors. There were 15 questions dealing with information from each of 
the following four categories: information presented by pictures and text (PT), information 
provided by pictures only (P), and two types of information presented by text only: 
information that bore a direct relationship to illustrated text content (Tpt), and information 
that did not (T). An example of the former is a question about who makes the masks that are 
used during funeral rites and are shown on one of the slides. An example of the latter is a 
question about the number of blacksmiths — not portrayed — that live in a particular village. 


Subjects 

Forty-five students from Utrecht University participated in the study. They were 
randomly assigned to one of three conditions that were presented with either tape only, slides 
only, or tape and slides. 


Procedure 

Subjects were tested one or two at a time. Prior to the presentation of the learning 
material, they were instructed to try and remember as much as possible of the passage they 
were about to hear and/or about the slides they were going to see. Subjects in the tape-slides 
condition listened to the passage while they viewed each slide for as long as it wat relevant for 
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the topic discussed in the text. The average viewing time per picture was about 2 minutes. 
Subjects in the tape condition just listened to the text, while subjects in the slides condition 
viewed each picture for as long as subjects in the tape-slides group. 


Two days after presentation of the learning material, the students took the 60-item post- 
test. They were told to indicate the correct response and to rate their confidence for each 
answer on a scale provided. The scale ranged from 1 to 5, with 1 indicating low confidence and 
5 indicating high confidence. 


Subjects in the tape condition were told the test contained questions about slides which 
students in another condition had seen, and they were asked to try to answer these questions 
too so that a baseline for this category could be developed. A similar instruction, mutatis 
mutandis, was given to the subjects in the slides condition. 


RESULTS 


Mean post-test scores for the four types of questions and confidence ratings given correct 
response, are shown in Table 1. 


TABLE 1 


MEAN PosT-TEST SCORES AND CONFIDENCE RATINGS WiTH STANDARD DEVIATIONS IN BRACKETS FOR 
THE FOUR CATEGORIES OF INFORMATION 





Post-test scores Confidence ratings 
Category of 
information P T Tpt PT P T Tpt PT 
Slides M 8-43 5-29 4:00 8-29 4-01 1:95 1:76 4:29 
SD (2-21) (1:73) (1-80) (1:59) (0.61) (0-72) (0-83) (0-35) 
Tape M 3-79 10:29 10-07 10-43 1:26 4:16 4-11 4-00 
SD (1:37) (1:77) (2-02) (2-24) (0:69) (0-44) (0-60) (0-70) 
Tape-slides M 5:64 9:57 9-36 12:50 2-85 3:92 3:91 4:51 


SD (1-69) (2-10) (1-74) (1:65) (0:50) (0-61) (0-41) (0-36) 


A one-way analysis of variance with subsequent Newman-Keuls comparisons was 
performed on the scores of each question type. For each category, significant effects were 
obtained, the F values being: F — 18:12 for PT; F — 20-56 for P; F — 29:11 for T, and 
Е = 44:83 for Tpt, in each case df = 2,39, P < 0:001. 


As could be expected on the basis of previous findings, illustrated text content (PT) was 
better retained in the tape-slides than in the tape condition, but both conditions also differed 
significantly from the condition that viewed the slides without the text. On picture-only 
information (P), however, the latter condition obtained a significantly better result than the 
tape-slides condition, which did better (P < 0-05) than the tape condition that performed at 
chance level. 


On the two text-only categories, T and Tpt, the tape-condition did slightly but 
consistently better than the tape-slides condition, but the differences failed to reach 
significance. Both conditions, of course, significantly outperformed the slides condition 
which scored at chance level. 


As can be seen in Table 1, the confidence ratings reveal patterns that are very similar to 
those of the post-test scores. Only for the category of PT information a somewhat different 
pattern emerged, with only the difference between the tape-slides condition and the tape 
condition redching significance (P « 0:05). 
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DISCUSSION 


The results of this study contain several features that are of interest for the understanding 
of memory of an illustrated text. 


For the category of illustrated text content (PT), the significant difference between the 
tape-slides and tape-only condition gives further support to the general finding of earlier 
studies that learning information in the text that is also shown in pictures will be facilitated. 
An interesting feature for this category is the comparatively poor performance of the slides- 
only condition. It indicates the importance of text for the understanding of pictures, and is in 
accordance with studies that have shown how interpretation and retention of pictorial 
information may be enhanced by sentences or captions (Bacharach ef al., 1976; Carr et al., 
1977; cf. Olson, 1970). 


Further inspection of the results in the tape-slides and slides-only condition, however, 
shows that the effect of text on memory for picture content is an interactive one: while 
memory for pictorial information referred to in the text is enhanced, there was a significantly 
lower performance for content that is provided by the illustrations only (P). The poor 
performance of the subjects in the tape-slides condition in comparison with the slides group, 
suggests that under influence of the text, these subjects were too narrowly focused on picture 
elements directly relevant to the text. At the same time, however, it should be noted that their 
performance still differed significantly from the chance level demonstrated in the tape- 
condition. 


There is no evidence that retention of non-illustrated text content benefited from the 
presence of the illustrations — not even when this content (Tpt) bore a clear relationship to 
information (PT) presented by pictures and text. If anything retention of information from 
the T and Tpt categories was poorer when illustrations were present than when they were not 
but the difference failed to reach significance. The results, therefore, support the general 
conclusions of Levie and Lentz (1982), rather than the findings by Van Dam ef al. (1986) and 
Gunter (1980), or the hypothesis advanced by Rusted (1984). The failure to obtain facilitation, 
even though expository text was used, may be partly due to the oral presentation of the 
learning material which perhaps made subjects focus their attention on the illustrated text 
content to a stronger degree than they would have done with a written text. The absence of a 
facilitative effect for unillustrated text in two other recent studies dealing with orally presented 
expository text (Levin and Berry, 1980; Nugent, 1982) supports this interpretation. The failure 
to find facilitation may also be due to the length of the text, in relation to the number of 
pictures. With extensive texts with few pictures (and consequently many unillustrated text 
elements) the benefits of illustration for the text as a whole may be considerably smaller than 
pes short passages are used, as was the case in Rusted's studies (e.g., Rusted and Hodgson, 
1985). 


Correspondence and requests for reprints should be addressed to Dr. J. Peeck, Rijksuniversiteit 
oe Psychologisch Laboratorium, Vakgroep Psychonomie, Heidelberglaan 2, Trans II, 3584 CS, 
trecht, Holland 
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BOOK REVIEWS 


HancREAVES, D. (1986). The Developmental Psychology of Music. London: 
Cambridge University Press, pp. 260, £8-95 pbk, ISBN 0-521-31415-1. 


This book comprises a survey of the present state of music psychology and music 
education theory. In the author's view, this material should be considered alongside those 
psychological theories and research projects which could prove useful in constructing a 
developmental psychology of music. It will be welcomed enthusiastically by music educators 
everywhere, many of whom have long wished to see their subject given more considegation 
within the field of developmental psychology. 

The author has chosen to spread his net very widely indeed, including accounts of the 
work of such diverse figures as Freud and the social psychologist Konécni, whose recent work 
is concerned with the interaction between musical preference and the social context of 
listening. A summary of the methods of musical instruction by Kodaly, Orff, and Suzuki is 
included; David Hargreaves rightly observes that each ‘‘embodies an implicit view of the 
nature of child development and of the role that music ought to play within it’’ (p. 221). There 
are chapters on the relation between developmental psychology and music to date in the pre- 
school child and school child, the development of responses to music, including a survey of 
tests designed to examine individual responses to music (a project closely related to the study 
of personality), creativity, personality, and musical development, and social psychology and 
musical development. 

David Hargreaves ends his survey with a chapter on developmental psychology and music 
education, concluding that further research is needed into a cognitive-developmental 
approach along Piagetian lines. He rightly states that up to the present music education has 
explored the behavioural approach almost exclusively. He views the study of the psychology 
of music as extremely complex, and demanding a readiness to explore the interaction between 
cognitive, social, and affective dimensions of development. 

Such a wide-ranging brief as that constructed by David Hargreaves is necessarily lacking 
in depth in certain areas. Many would take issue with Hargreaves' equation of the theories of 
Noam Chomsky and Heinrich Schenker, for example, as ''essentially concerned with 
describing the structures of language and music respectively" (p. 10). Surely Noam Chomsky 
was concerned with the nature of language itself, whereas Heinrich Schenker attempted to 
provide musical insights into certain carefully selected works of ‘‘genius’’ (Schenker’s own 
selection) belonging to the Western classical tradition and written between approximately 1700 
and 1870. He most certainly was not concerned with the structure of all music, as Hargreaves 
suggests. Hargreaves advocates an inclusive model of research, using musical material from 
many cultures rather than music from the Western classical tradition, as has been the case in 
the majority of studies in the past. Comparisons between differing musical traditions, 
however, and the array of variables involved which confront the developmental psychologist, 
are given very short shrift. 

In his conclusion, Hargreaves states that ''In spite of the immense potential for 
collaboration between workers in the two fields (music and psychology), which would be to 
their mutual benefit, the detailed connections remain largely unexplored'' (р. 226). Could this 
be because musical practice is not easily separated into measurable components? Hargreaves 
optimistically thinks that “‘research ought to enable us to investigate the interdependence of 
"formal" and *''intuitive'" musical skills, for example, as well as to assess the age levels at 
which different component skills are most appropriately taught’’. It could be that the task of 
deciding precisely which skills are ‘‘formal’’ and which ‘‘intuitive’’ proves more difficult than 
the author anticipates. 

The book contains a separate subject and author index and the usual bibliography. It is to 
be highly recommended as an introduction to the subject, and should remain useful in the 
foreseeable future as a handy work of reference. 

JENNY HUGHES 
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Hosa, J., and Raynes, N. V. (Eds.) (1987). Assessment in Mental Handicap. 
Beckenham: Croom Helm, pp. 289, £22-50 hbk, ISBN 0-7099-3744-X; £10-95 
pbk, ISBN 0-7099-3745-8. 


To cover ‘‘the full gamut of assessment procedures for use with mentally handicapped 
people” in a book ‘‘aimed at all teachers, psychologists and health professionals" is a tall 
order. To attempt this in a single edited volume is more daunting still. That the book comes 
clos@§o doing all these things is to the credit of the editors who have done well to ensure a high 
degree of uniformity in purpose and style in the eight contributions which the book contains. 
Indeed, for the most part the reader is not aware of differences in authorship and the book 
reads almost as a single authored book. 

In the introductory chapter the editors suggest that the present volume was intended to 
constitute an updated successor to Peter Mittler's Psychological Assessment of Mental and 
Physical Handicap, which they acknowledge as having been the authoritative text on the 
subject since its publication 15 years ago, but which clearly does not take into account recent 
changes in legislation (e.g., the 1981 Education Act), in provisions (e.g., the move towards 
community placement) or in attitudes towards assessment (e.g., the emphasis on 
intervention). In covering basic issues I think the editors have succeeded in doing this. I would 
certainly commend the book to anyone looking for a basic introductory text. In terms of the 
book's target audience, this is certainly the book for teachers and health professionals — 
psychologists may find it useful as a summary of ideas not readily available elsewhere in a 
single volume. A second companion volume for psychologists and researchers may still be 
needed to complete the task of updating Mittler's earlier work. 

The eight chapters are remarkably comprehensive. Following the general introduction, 
Mike Berger and Bill Yule provide a good overview of issues in psychometric assessment which 
I think teachers in training and health professionals will find very readable and informative. 
This simple and clear style is maintained in succeeding chapters on ‘‘Early Development and 
Piagetian Tests’’ and ‘‘Adaptive Behaviour Scales’’ written by the editors and subsequently in 
chapters on ‘‘Physical Development, Hearing and Vision”, **Criterion Referenced Tests" and 
“Direct Observation" by Judy Sibha, Chris Kiernan and Glynis Murphy. The one chapter 
which doesn't maintain quite the same focus as these is one on ‘‘Behaviour Disturbance and 
its Assessment", where Ivan Leudar and William Fraser seem to be writing more for the 
researcher and the psychologist than other contributors; not that there is anything wrong with 
the chapter, simply that it appears, in part, to be aimed at a different target audience. 

The book also contains a very useful Appendix which lists full details of all of the tests 
and assessment instruments referred to in the text. Used in conjunction with the evaluative 
comments in the text readers will find this a useful guide to whether or not a test will be of any 
value to them. Information provided here includes basic facts about the tests such as the 
publisher and the age range for which it is suitable but also a key reference for those wishing to 
read something about the background to the tests or an example of its application together 
with useful comments on whether or not the test requires special training, how it is 
administered, etc. 

In sum, I suspect that what we have here is a book which is likely to become a standard 
introductory text in the field. It is written by acknowledged experts who have generally 
managed to write in a simple and direct style which makes it accessible to even those with a 
minimum of previous psychological training. It is a book which can be recommended for 
purchase and is a must for libraries and staff room in any organisation which caters for the 
mentally handicapped. 

GRAHAM UPTON 


Puurs D. С. (1987). Philosophy, Science and Social Inquiry. Contemporary 
Methodological Controversies in Social Science and Related Applied Fields of 
Research. Oxford: Pergamon, pp. 228, #17-50 hbk, ISBN 0-08-0334105; £9-95 
flexicover, ISBN 0-08-0334113. 


This book sets out to provide an exposition and critique of recent debates in the 
philosophy of science in a form accessible to non-philosophers and to use these debates to 
illuminate sonfe contemporary methodological controversies in the social sciences and related 
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applied fields. The material is presented in three parts. The first covers developments in the 
philosophy of science over the past three or four decades, including the work of Popper, 
Lakatos, Feyerabend, Hanson, Kuhn and Winch among others. Part two addresses debates 
concerning the methodological unity or otherwise of the natural and social sciences 
(naturalism versus anti-naturalism), including a discussion of the rise of hermeneutics and 
holistic approaches and their application in the social sciences. In the third part the perspective 
developed in the first two, ‘‘a mild Popperianism and naturalism" (p. 116), is applied to 
selected topics in the area of cognitive structure and development. There are chapters on the 
structure of knowledge debate, on how to describe students' cognitive structures, on Piaget's 
views concerning change and development of cognitive structures and on Kohlberg's stage 
theory of moral development. 

One weakness of the book is its proper organisation. It is based on papers published over 
the last ten years or so and only the brief introductory chapter to each part is completely new. 
Although the original papers have been extensively revised to produce the other eleven 
chapters, this has not resulted in a systematic exposition of the material, at least in the first 
two parts. Recent philosophical developments, for example, are dealt with neither in 
chronological nor in any evident logical order. Discussions of individual philosophers occur in 
several different chapters with some inevitable repetition. Bits of ''applications" (of 
philosophical ideas to current social research) appear in chapters otherwise largely devoted to 
exposition. The considerable amount of cross-referencing is helpful but does not adequately 
compensate for this lack of systematic organisation. 

As a textbook, therefore, this has some limitations. While it is written in a lively and 
entertaining manner, the lack of organisation is confusing and the reader is often unsure what 
stage the argument has reached. In places the treatment of issues and writers is somewhat 
Sketchy and there are some confusions, for example in the discussion of the notion of 
incommensurability. 

The organisational disarray is to some extent mitigated by the unifying perspective 
provided by the author's commitment to Popperian critical rationalism extended and 
modified by Lakatos's ideas concerning the methodology of scientific programmes. However, 
one searches in vain for a systematic discussion of the weaknesses of Popper and Lakatos. 
Thus, for example, the final chapter demonstrates convincingly that the research programme 
defined by Kohlberg's stage theory of moral development is, in Lakatosian terms, 
**degenerating" but no attempt is made to defend Lakatos's ideas themselves. The reader 
must make good this deficiency of the text by pursuing the references in the bibliography, 
which provides a useful means of access to critical discussion of Popper and Lakatos. 

It is also worth noting that the author is advocating naturalism in a fairly weak sense. 
Thus he acknowledges a hermeneutical element in the social sciences and he is not against the 
use of qualitative methods. Neither precludes the social sciences from being naturalistic in the 
sense of their subject matter being susceptible to the application of the Popperian ‘‘method of 
conjecture and refutation", that is, the exposing of hypotheses to criticism and to the 
possibility of falsification. (The term ‘‘method”’ is used throughout somewhat loosely to cover 
both individual research methods and what is often called ‘‘methodology’’.) 

This book is not wholly successful in achieving its aims but within its limitations it 
provides some illuminating discussions (for example, in chapters 4, 8 and 12). If it had been 
presented as a collection of separate papers on related themes with a substantial overview 
chapter, fewer of the reader's expectations would be disappointed. 

CAROLYN STONE 


RicHARDS, M., and Ілонт, P. (Eds.) (1986). Children of Social Worlds. Cambridge: 
Polity Press, pp. 327, £25-00 hbk, ISBN 0-7456-0099-9. 


Whilst it would seem that this book is not written especially for the lay reader, it is 
certainly a book which many on the fringe of the disciplines concerned can both make sense of 
and enjoy. The fourteen contributions are placed thoroughly in the context of human 
development and interaction and the editor claims (rightly in my opinion) that the volume 
represents a sustained attempt to avoid unduly individualistic approaches. 

It seems, therefore, that this tome is in the ‘‘currently received tradition’’ - that is, in the 
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tradition which takes the backbone of psychology to mean little unless there is offered some 
understanding of its context and connectedness. The contributions, powerful if somewhat 
uneven, are by writers in varying degrees known or extremely well known in their fields. 

If (as one suspects) the editors, Martin Richards and Paul Light, had difficulty in shaping 
the four sections and holding the cohesion and coherence throughout the book, then their 
endeavours are rewarded with considerable success. Read in separate chapters, or in larger 
measures, this volume inducts the reader powerfully and clearly into the latest state of the art, 
be it psycholinquistics or feminism. The range, to, is interesting. The title would lead the 
reader to expect work on the birth cohort studies, or on issues in the social development of 
children. Even so, it is a pleasure to read Michael Wadsworth — who manages such a secure 
and easy grasp of three complex studies, yet presents the material in a direct and non-jargon 
way which is simple to digest and allows a feeling of almost discursive interaction with the 
author. The title did not lead me to expect certain delights however; for instance, the work by 
Ann Oakley on feminism and motherhood. Surely a gem of its kind, informative, easy, 
succinct and above all sympathetic. 

Unfortunately not all the writing is so easy to assimilate. Considerable density of prose 
(displayed by Robinson) may be necessary in order to cover the ground in the short space 
given, but a little unpacking of those complicated sentences on meaning and message would 
have made the mastery of this difficult area more approachable. Yet, despite that criticism, it 
is clear to me that the Robinsons have been pioneers in a most complex area, working for 
many years to build a more accurate picture of how children discriminate, understand, 
attribute causality or ‘‘know that they do not know”. 

For this reader the best chapter of all was the chapter by Cathy Urwin, on developmental 
psychology and psycho-analysis. Urwin's views of phantasy not being equated with cognition 
and not being capable of reduction to a similar cognitive structure offer an opportunity to 
explore the value of other areas of insight which distinguish much that is psychoanalytic, 
profound (yet) dangerously unscientific. As psychologists we do perhaps need to be reminded 
that ‘‘we know more than we сап tell! 

All in all, a lively read, full of much to reward and revisit. 

JANE FALK 


Seymour, P. Н. К. (1986). Cognitive Analysis of Dyslexia. London: Routledge and 
Kegan Paul, pp. xi + 265, £26-50 hbk, ISBN 0-7100-9841-3. 


In his introductory chapter Seymour draws a distinction between *'three possible levels of 
description and explanation". The first involves ‘‘a statement of . . . publicly observable 
behaviour . . .", the second ‘‘an analysis, stated in functional (or information processing) 
terms of psychological processes underlying the behaviours" , and the third “ап account of the 
manner in which the mental functions underlying competence are realised in neural tissue” 
(p. 3). In this book he is concerned primarily with the second of these three levels. The 
essential features of the model which he uses are (i) a ''semantic processor", (ii) a 
“phonological processor', and (iii) a ''visual (graphemic) processor". He postulates a 
**graphemic route" (‘‘g’’) and a ‘‘morphemic route’ (‘‘m’’) from the visual processor to the 
phonological processor and a further ‘‘morphemic route” (‘‘m’’) from the visual processor to 
the semantic processor. 

With this model as a guide he tested 13 non-dyslexic subjects, aged between 10 and 12, 
and 21 dyslexic subjects, aged between 12 and 25, on a variety of tasks. For example, he 
measured the time needed to read aloud high frequency words, low frequency words, and non- 
words; he also measured the time needed to say ‘‘yes’’ or “по” in various decision tasks, for 
example whether a display of letters was or was not a word (‘‘lexical decision’’) and whether а 
second word was or was not an instance of the first, e.g., ''furniture-table"" (‘‘semantic 
decision'"). The results from the normal readers are used as a guide to indicate what is “раг 
for the course’’, and inferences are then made as to what routes are deficient in the case of the 
dyslexic subjects. His conclusion is that there is no single adverse factor ‘‘which produces a 
consistent set of impairments within the processing system'' (p. 240); in this sense one must 
conclude that ‘‘developmental dyslexia is not a homogeneous category” (p. 239). 

Predictably the book has many merits. The objectives are clearly stated; the relevant 
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experiments are carried out with care, and the style is scholarly and lucid. Why, then, does one 
feel uneasy? 

The main reason appears to be this. Seymour assumes, without discussion, that the 
models used for the understanding of acquired dyslexia are also appropriate in the case of 
developmental dyslexia. (In passing, one should note that the two types of researcher differ 
even in their use of the word ‘‘dyslexia’’: those working with brain-injured patients normally 
treat it as equivalent to ‘‘reading difficulty’’, whereas for those working in the developmental 
field reading problems are only a small part of a larger family of language deficits). In the case 
of the syndromes of acquired dyslexia (‘‘deep’’, ‘‘surface’’, etc.) the implication is of some 
relatively permanent state; and if the subject’s pattern of responding changes this is tikely to 
mean only that he has developed alternative strategies by way of compensation. In 
developmental dyslexia, however, the case for such permanence is by no means established: to 
make a very obvious point, the dyslexic child can learn! It follows that his success or otherwise 
in reading, say, non-words could quite well be a function of the amount of teaching which he 
has so far received in the area of grapheme-to-phoneme correspondences. That there is a 
permanent slowness in processing symbolic information is, of course, well established; but to 
pass without adequate safeguards from ‘‘John failed at this task” to **John lacks the route for 
carrying out this task” is extremely dangerous. 

The research reported in this book, as Seymour would no doubt agree, can usefully be 
described as ‘‘theory-driven’’ rather than ‘‘problem-driven’’. This is not necessarily a 
weakness. In the case of any theory-driven research, however, it is important to ask how the 
results can be of help in solving practical problems. In this case it therefore seems pertinent to 
ask how the knowledge that a child lacks, or is weak on, a particular route helps the practising 
teacher. If a child does not know what speech sounds are represented by what letters the 
obvious policy for the teacher is to try to teach him! Then, if particular procedures are 
unsuccessful, alternative ones can be explored. In all this it is hard to see what can be done 
with the ‘‘three processors" model or any of its variants which could not equally well have 
been done without them. 

Finally, just because in terms of the model the phenomena turns out to be somewhat 
heterogeneous one surely cannot conclude, as Seymour does, that developmental dyslexia is a 
heterogeneous condition simpliciter. No mention is made of the work of Liberman, Brady, 
Aaron and others whose evidence seems to point in the direction of homogeneity; and part of 
the шен in this area is that the criteria for homogeneity vary from one researcher to 
another. 

To many of us the question, ‘‘What sorts of task do dyslexics find difficult?"', is of more 
interest and importance — given the present state of knowledge — than the question, ‘‘Which 
routes (or mental structures) are impaired?" If this view is correct then Seymour's decision to 
concentrate on ''level 2" (an account of his data in information processing terms) is a 
mistaken one. Perhaps the main lesson to be learned from this challenging book is that in the 
area of developmental dyslexia one should not allow one's theoretical model to steal the show. 


T. R. MiLES 
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